Backend Development 11 min read

Mastering API Protection: Rate Limiting, Caching, and Degradation for E‑Commerce Spikes

When a product suddenly surges in demand, this guide explains how to safeguard e‑commerce APIs using rate‑limiting algorithms (leaky bucket, token bucket, sliding window), Nginx and Java semaphore controls, distributed throttling with message queues, service degradation strategies, and caching techniques to maintain stability.

MaGe Linux Operations

Feb 19, 2023

Mastering API Protection: Rate Limiting, Caching, and Degradation for E‑Commerce Spikes

Set a scenario: if a product API suddenly spikes, what should be done?

For example, after the mascot "Bing Dwen Dwen" trended, tens of thousands of users rushed to place orders on Taobao without any cache warm‑up or preparation, leading to overload.

In high‑concurrency e‑commerce systems, protecting an interface typically involves three measures: caching, rate limiting, and service degradation.

Assume the interface has already passed risk control, filtering out half of the bot requests, leaving only genuine user orders.

Service Rate Limiting

Rate limiting aims to throttle concurrent requests either by limiting request speed or by limiting the number of requests within a time window; once the limit is reached, the service can reject, queue, wait, or degrade the request.

Rate Limiting Algorithms

1. Leaky Bucket Algorithm

The leaky bucket algorithm puts incoming requests into a bucket; if the bucket is full (reaching the limit), requests are discarded or handled by other strategies. The bucket releases requests at a fixed rate, ensuring the service consumption speed never exceeds the defined threshold.

The idea is that regardless of how many requests arrive, the interface’s consumption speed is always less than or equal to the outflow rate.

This can be implemented using a message queue.

2. Token Bucket Algorithm

The token bucket algorithm adds tokens to a bucket at a rate v (v = time period / limit). When a request arrives, it tries to take a token; if successful, the request passes, otherwise the limit strategy is triggered.

The difference from the leaky bucket is that the token bucket allows bursty traffic.

3. Sliding Window Algorithm

The sliding window algorithm divides a time period into N small intervals, records the request count for each interval, and discards expired intervals as time slides.

For example, with a 1‑minute window split into two 30‑second sub‑windows, the first sub‑window may have 75 requests and the second 100. If the sum of all sub‑windows exceeds the threshold (e.g., 100), the limit strategy is triggered.

Implementation examples include Sentinel and TCP sliding windows.

Ingress Layer Rate Limiting

Nginx Rate Limiting

Nginx uses the leaky bucket algorithm for rate limiting.

It can limit access based on client characteristics such as IP or User‑Agent. IP is more reliable because it cannot be forged, whereas User‑Agent can be easily spoofed.

Limit_req module based on IP: Module ngx_http_limit_req_module

tgngine: ngx_http_limit_req_module – The Tengine Web Server

Local Interface Rate Limiting

Semaphore

Java’s Semaphore from the concurrency library can easily control the number of simultaneous accesses to a resource. It acquires a permit before processing and releases it afterward.

Example:

private final Semaphore permit = new Semaphore(40, true);

public void process(){
    try{
        permit.acquire();
        // TODO: handle business logic
    } catch (InterruptedException e){
        e.printStackTrace();
    } finally {
        permit.release();
    }
}

Refer to source code for a concrete Semaphore implementation.

Distributed Interface Rate Limiting

Using Message Queues

Whether using an MQ middleware or Redis List as a message queue, it can serve as a buffering queue based on the leaky bucket principle.

When request volume reaches a certain threshold, a message queue can buffer incoming data and consume it according to the service’s throughput.

Service Degradation

After risk control, if the request concurrency rises sharply, a fallback plan can be activated to degrade the service.

Degradation is typically applied to services or tasks that are not critical or urgent, allowing them to be delayed or paused.

Degradation Strategies

Stop Edge Services

For example, during Taobao’s Double‑11 promotion, queries for orders older than three months might be disabled to preserve core service availability.

Reject Requests

When request volume exceeds the threshold or many failures occur, some requests can be outright rejected.

Rejection Policies

Random rejection: randomly drop requests that exceed the limit.

Reject older requests: prioritize newer requests and drop earlier ones.

Reject non‑core requests: maintain a whitelist of core services and reject everything else.

Recovery Strategies

After degradation, additional consumer services can be registered to handle the surge, and some servers can be gradually re‑loaded.

Specific implementation details can be found in related articles.

Data Caching

When a protected interface experiences a sudden surge, the following steps can be taken:

Use a distributed lock to block access.

Cache hot data in a caching middleware during the short burst.

After releasing the lock, prioritize operations on cached data.

Send the operation results to a consumer via a message queue for asynchronous processing.

Cache Issues

Assume an inventory interface has only 100 items in the database. If all requests start hitting the cache, the cache can still become a bottleneck.

Read‑Write Separation

One approach is read‑write separation using Redis Sentinel cluster mode for master‑slave replication. Reads dominate writes; when inventory reaches zero, read operations can fail fast.

Load Balancing

Another idea is to split the inventory across multiple cache instances. Inspired by ConcurrentHashMap ’s counterCells, 100 items could be divided into 10 caches, each handling 10 items, with requests load‑balanced among them.

However, if most users hash to the same cache, other caches remain idle, leading to inaccurate “out‑of‑stock” responses.

Page Cache

Many software architectures use a page‑cache approach, similar to Linux kernel disk writes or MySQL flushing, where short‑term write operations are aggregated and performed in the cache before being persisted.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems e‑commerce service degradation rate limiting

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.