Backend Development 13 min read

How to Rescue a System When QPS Jumps 100× in 10 Minutes

When a service experiences a sudden 100‑fold QPS surge, this guide walks through immediate emergency measures such as rate limiting, circuit breaking, and traffic shedding, followed by systematic analysis of traffic origins, robust architectural redesign including horizontal scaling, micro‑service decomposition, sharding, pooling, caching, and asynchronous processing, and finally stress testing to ensure resilience.

ITPUB

Apr 29, 2026

How to Rescue a System When QPS Jumps 100× in 10 Minutes

Introduction

This article presents a typical ByteDance interview scenario: a backend system suddenly faces a traffic spike of 100 times the normal QPS. It outlines a comprehensive approach for handling the emergency, analyzing the cause, redesigning the architecture, and validating the solution.

1. Emergency Response – Quick Stop‑Bleeding

1.1 Rate Limiting

Immediately discard excess requests to protect the system. Common implementations include:

Guava RateLimiter for single‑node limiting.

Redis for distributed limiting.

Alibaba’s open‑source sentinel for distributed rate limiting.

Token‑bucket or leaky‑bucket algorithms to enforce a request rate and drop requests that exceed the threshold.

Rate limiting controls the request rate at a network interface, preventing DoS attacks and limiting web crawlers. It ensures system stability under high concurrency.

1.2 Circuit Breaking & Degradation

Circuit breaking protects a distributed system from cascading failures (service avalanche) by quickly failing non‑critical services.

Circuit Breaker : Enable for non‑core services (e.g., recommendation, comments) using tools like Hystrix to fail fast and free resources for core paths (payment, order).

Service Degradation : Shut down non‑essential features (e.g., analytics, logging) and return fallback data (e.g., cached product info) to reduce backend pressure.

1.3 Elastic Scaling

Scaling Out : Add read replicas or upgrade instance configurations (e.g., more MySQL/Redis replicas) to increase traffic capacity.

Traffic Shifting : For multi‑datacenter deployments, route traffic from an overloaded region to another.

1.4 Message Queue Smoothing

During high‑traffic events such as Double‑11 sales, introduce a message queue to absorb bursts. Example: if the system can process 2k requests/s but receives 5k, the queue allows the application to pull 2k requests per second, preventing overload.

2. Calm Analysis – Why Did Traffic Spike?

Determine whether the surge is legitimate (e.g., promotional campaigns) or abnormal (bugs, malicious attacks). Actions include:

Analyze logs and monitoring data; if a bug, assess impact and fix quickly.

If malicious, block IPs, add to blacklist, and apply risk‑control rules.

If a normal promotion, evaluate the scope (single API vs. all APIs), time window, and whether system metrics (CPU, memory, disk) indicate a bottleneck that requires urgent handling.

3. Design Phase – Building a Robust System

3.1 Horizontal Scaling (Divide‑and‑Conquer)

Deploy multiple instances instead of a single server to avoid single‑point failures and increase overall concurrency.

3.2 Microservice Decomposition

Split a monolith into independent services (e.g., user, order, product) to distribute load and improve scalability.

3.3 Database Sharding & Partitioning

High traffic can cause MySQL “too many connections” errors. Mitigate by:

Splitting data across multiple databases.

Partitioning large tables (e.g., when a table reaches tens of millions of rows) to improve query performance.

3.4 Connection Pooling

Use database, HTTP, or Redis connection pools to reuse connections instead of creating and destroying them per request, thereby reducing latency and increasing throughput.

3.5 Caching

Leverage caches (Redis, JVM local cache, Memcached) to serve frequently accessed data and handle tens of thousands of concurrent requests with a single Redis instance.

3.6 Asynchronous Processing

Asynchronous calls allow the caller to continue without waiting for the callee to finish, preventing thread blockage under high concurrency.

Implement asynchronous flows with message queues: enqueue bursty requests (e.g., flash‑sale orders), respond instantly to users, and process the queue in the background, releasing resources for additional traffic.

4. Stress Testing – Verifying Capacity

Before launch, conduct load testing (e.g., with LoadRunner or JMeter) to identify the maximum concurrent load, pinpoint bottlenecks across network, Nginx, services, or caches, and guide capacity planning.

5. Final Checklist

Apply rate limiting, circuit breaking, scaling, and traffic‑shaping to quickly stop the bleed.

After stabilizing, diagnose the root cause (bug, attack, or legitimate promotion).

Strengthen the system through horizontal scaling, service splitting, sharding, pooling, caching, asynchronous processing, and thorough stress testing.

Design fallback strategies for critical components (e.g., distributed locks, optimistic locks, data verification) to ensure graceful degradation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

microservices sharding caching stress testing rate limiting horizontal scaling circuit breaking

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.