Backend Development 9 min read

Rate Limiting, Service Degradation, and Caching Strategies for High‑Concurrency E‑commerce Interfaces

The article explains how to protect high‑traffic e‑commerce APIs by using caching, rate‑limiting (leaky‑bucket, token‑bucket, sliding‑window), local semaphore controls, distributed message‑queue buffering, service degradation tactics, and recovery plans to ensure stability during sudden traffic spikes.

Architecture Digest
Architecture Digest
Architecture Digest
Rate Limiting, Service Degradation, and Caching Strategies for High‑Concurrency E‑commerce Interfaces

Service Rate Limiting

Rate limiting aims to control the speed of concurrent requests either by limiting the request rate per time window or by throttling the overall concurrency; once the limit is reached, the system can reject, queue, wait, or downgrade the request.

Rate‑Limiting Algorithms

Leaky Bucket Algorithm Incoming requests are placed into a bucket; when the bucket is full, excess requests are discarded or trigger a limiting strategy. The bucket releases requests at a fixed rate, ensuring the output rate never exceeds the configured threshold.

Token Bucket Algorithm Tokens are added to a bucket at a rate of v = time‑period / limit . When a request arrives, it consumes a token; if a token is unavailable, the request is limited. This algorithm allows short bursts of traffic.

Sliding‑Window Algorithm The time window is divided into N sub‑intervals; each sub‑interval records its request count, and old sub‑intervals are removed as the window slides. If the sum of counts in the current window exceeds the threshold, limiting is triggered.

Access‑Layer Rate Limiting

Nginx Rate Limiting

Nginx implements rate limiting using the leaky‑bucket algorithm via the limit_req module, which can limit requests based on client IP or User‑Agent, with IP being more reliable.

Local Interface Rate Limiting

Semaphore

Java's Semaphore can control the number of concurrent accesses to a resource. Example code:

private final Semaphore permit = new Semaphore(40, true);

public void process() {
    try {
        permit.acquire();
        // TODO: handle business logic
    } catch (InterruptedException e) {
        e.printStackTrace();
    } finally {
        permit.release();
    }
}

Distributed Interface Rate Limiting

Using Message Queues

Both MQ middleware and Redis List‑based queues can act as buffering queues based on the leaky‑bucket principle. When request volume reaches a threshold, the queue buffers incoming traffic and consumes it according to the service’s throughput.

Service Degradation

When an interface’s request concurrency rises sharply after risk control, a fallback plan can be activated to degrade non‑critical services.

Typical degradation actions include delaying or pausing low‑priority services, stopping edge‑case features (e.g., disabling queries for orders older than three months during a sales event), or outright rejecting requests.

Rejection strategies :

Random rejection of excess requests.

Reject older requests first.

Reject non‑core requests based on a predefined core‑service list.

Recovery Plan

After degradation, additional consumer services can be registered to handle the remaining traffic, and slow‑load strategies can be applied to certain servers.

Data Caching

When request volume spikes, the following steps can be taken:

Block incoming requests with a distributed lock.

Cache hot data in a caching middleware during the short blocking period.

After releasing the lock, let all requests operate on the cached data first.

Send the final results to a consumer service via a message queue for asynchronous processing.

Cache Issues

Read‑Write Separation

Use Redis Sentinel cluster mode for master‑slave replication; reads dominate writes, and when inventory reaches zero, read operations fail fast.

Load Balancing

Split inventory across multiple cache instances (e.g., 100 items into 10 caches with 10 items each) and balance requests among them. Beware of hash skew that could concentrate traffic on a single cache.

Page Cache

Aggregate short‑term write operations in memory and flush them to the underlying store later, a technique used in OS page caches, MySQL flushing, etc.

Author: 舍其小伙伴 (Source: blog.csdn.net/weixin_44414492/article/details/123027974). Content is shared for learning purposes; copyright belongs to the original author.
Distributed SystemsBackend Developmentcachingservice degradationRate Limiting
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.