Backend Development 14 min read

Mastering Rate Limiting for High‑Traffic Flash Sale Systems

This article explains why rate limiting is essential for flash‑sale (seckill) services, compares token‑bucket and leaky‑bucket algorithms, and provides practical configuration examples for Tomcat, Nginx, and OpenResty, along with testing methods and code snippets.

Java Backend Technology

Jun 21, 2018

Mastering Rate Limiting for High‑Traffic Flash Sale Systems

Preface

As the saying goes, great achievements require persistent effort; two weeks ago a flash‑sale prototype was shared on a major Chinese platform, receiving feedback and suggestions. Distributed systems, clustering, and flash‑sale mechanisms are not exclusive to large companies—anyone can arm themselves with these techniques.

Rate Limiting

When millions of users compete for a limited stock, allowing every request into the queue or cache wastes resources. To protect the backend, we must limit traffic so that only a portion of users receive normal service.

Rate‑Limiting Algorithms

Common algorithms include the token bucket and leaky bucket.

Token Bucket

The token‑bucket algorithm, widely used for traffic shaping and rate limiting, controls the number of data packets sent to a network and permits burst traffic.

In a flash‑sale scenario we assume a request rate of 10 r/s, with tokens added at 5 tokens/s and a maximum of 20 tokens in the bucket, causing some requests to be dropped.

Leaky Bucket

The leaky‑bucket algorithm smooths burst traffic by controlling the rate at which data enters the network, providing a steady flow.

When the bucket is full, excess requests are rejected.

Applying Rate Limiting

Tomcat

Configure a custom thread pool, maximum connections, and request queue in Tomcat to achieve rate limiting.

<ol><li><code><Executor name="tomcatThreadPool"</code></li><li><code>namePrefix="tomcatThreadPool-"</code></li><li><code>maxThreads="1000"</code></li><li><code>maxIdleTime="300000"</code></li><li><code>minSpareThreads="200"/></code></li></ol>

name: unique name for the shared thread pool.

namePrefix: prefix for thread names, default tomcat-exec-.

maxThreads: maximum number of threads, default 200.

maxIdleTime: idle time before a thread is closed, default 60000 ms.

minSpareThreads: minimum idle threads to keep, default 25.

Connector Configuration

<ol><li><code><Connector executor="tomcatThreadPool"</code></li><li><code>port="8080" protocol="HTTP/1.1"</code></li><li><code>connectionTimeout="20000"</code></li><li><code>redirectPort="8443"</code></li><li><code>minProcessors="5"</code></li><li><code>maxProcessors="75"</code></li><li><code>acceptCount="1000"/></code></li></ol>

executor: references the custom thread pool.

minProcessors: threads created at startup.

maxProcessors: maximum threads that can be created.

acceptCount: max queued requests when all processors are busy.

API Rate Limiting

Flash‑sale APIs can experience hundreds‑fold traffic spikes, risking system collapse. Using Guava’s RateLimiter (based on the token‑bucket algorithm) provides a simple way to limit API calls.

Custom Annotation

/**
 * Custom annotation for rate limiting
 */
@Target({ElementType.PARAMETER, ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface ServiceLimit {
    String description() default "";
}

Aspect Implementation

@Component
@Scope
@Aspect
public class LimitAspect {
    // 100 tokens per second for a single‑process service
    private static RateLimiter rateLimiter = RateLimiter.create(100.0);

    @Pointcut("@annotation(com.itstyle.seckill.common.aop.ServiceLimit)")
    public void ServiceAspect() {}

    @Around("ServiceAspect()")
    public Object around(ProceedingJoinPoint joinPoint) {
        Boolean flag = rateLimiter.tryAcquire();
        Object obj = null;
        try {
            if (flag) {
                obj = joinPoint.proceed();
            }
        } catch (Throwable e) {
            e.printStackTrace();
        }
        return obj;
    }
}

Business Implementation

@Override
@ServiceLimit
@Transactional
public Result startSeckill(long seckillId, long userId) {
    // business logic omitted, see source code
}

Distributed Rate Limiting

Nginx

Limit each IP to 50 requests per second; excess requests receive a 503 error.

nginx.conf

# Define zone for request rate limiting
limit_req_zone $binary_remote_addr $uri zone=api_read:20m rate=50r/s;
# Define zone for connection limiting per IP
limit_conn_zone $binary_remote_addr zone=perip_conn:10m;
# Define zone for connection limiting per server
limit_conn_zone $server_name zone=perserver_conn:100m;

server {
    listen 80;
    server_name seckill.52itstyle.com;
    index index.jsp;
    location / {
        limit_req zone=api_read burst=5;
        limit_conn perip_conn 2;
        limit_conn perserver_conn 1000;
        limit_rate 100k;
        proxy_pass http://seckill;
    }
}

upstream seckill {
    server 172.16.1.120:8080 weight=1 max_fails=2 fail_timeout=30s;
    server 172.16.1.130:8080 weight=1 max_fails=2 fail_timeout=30s;
}

Explanation: limit_req queues excess requests (burst size 5); limit_conn restricts concurrent connections per IP (2) and per server (1000); limit_rate caps bandwidth per connection.

OpenResty

OpenResty provides Lua modules for traffic shaping. Using resty.limit.count limits total concurrent requests, resty.limit.conn limits requests per time window, and resty.limit.req implements token‑bucket and leaky‑bucket behavior.

Rate‑Limiting Scenarios

Limit total concurrent requests to protect system stability.

Limit request count per time window to prevent automated ticket‑snatching.

Smooth request flow to a steady rate (e.g., 20 r/s) using leaky‑bucket or token‑bucket.

Load Testing

Use ApacheBench (ab) to evaluate the configuration.

# Install
yum -y install httpd-tools
# Check version
ab -v
# Help
ab --help
# Test command
ab -n 1000 -c 100 http://127.0.0.1/

Sample output shows requests per second, average latency, and error rates, confirming the effectiveness of the rate‑limiting settings.

Conclusion

The presented rate‑limiting techniques—token bucket, leaky bucket, Tomcat thread‑pool tuning, Nginx limits, and OpenResty modules—provide a toolbox for handling flash‑sale traffic spikes; choose the approach that best fits your business scenario.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

load testing NGINX rate limiting tomcat Token Bucket OpenResty Seckill

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.