Backend Development 19 min read

Mastering Distributed Rate Limiting: Caching, Degradation, and Flow Control Techniques

This article explains how caching, degradation, and various rate‑limiting strategies—including semaphore‑based concurrency control, token‑bucket algorithms, Guava RateLimiter, custom annotations, Redis interceptors, and Nginx modules—protect high‑concurrency distributed systems, with practical Java code samples and configuration snippets.

21CTO

Feb 26, 2020

Mastering Distributed Rate Limiting: Caching, Degradation, and Flow Control Techniques

Three Essential Tools for High‑Concurrency Systems

When building distributed high‑concurrency systems, three tools protect the system: cache, degradation, and rate limiting.

Cache

Cache aims to improve system access speed and increase processing capacity.

Degradation

Degradation temporarily disables services when problems affect core processes; the services are re‑enabled after the peak period or once the issue is resolved.

Rate Limiting

Rate limiting protects the system by throttling concurrent requests or limiting the number of requests within a time window; once the limit is reached, the system can reject, queue, or degrade requests.

Problem Scenario

One day, a sudden ten‑fold traffic surge made an interface almost unusable, causing a cascade failure that crashed the whole system. Like an electrical fuse that breaks under overload, an interface needs a “fuse” to prevent unexpected request spikes from overwhelming the system.

Related Concepts

PV

Page View – total number of page accesses; each refresh counts as one.

UV

Unique View – counts a client IP once per day.

QPS

Queries per second – a key indicator of system load; exceeding a preset threshold may require scaling.

RT

Response Time – the time taken to respond to each request, directly affecting user experience.

Application‑Level Rate Limiting

1. Controlling Concurrency

Use a semaphore to limit the number of concurrent accesses. Example in Java:

public class DubboService {    private final Semaphore permit = new Semaphore(10, true);    public void process(){        try{            permit.acquire();            // business logic        } catch (InterruptedException e) {            e.printStackTrace();        } finally {            permit.release();        }    }}

The semaphore allows only ten threads to execute concurrently, even if more threads are running.

2. Controlling Access Rate

Token‑bucket and leaky‑bucket algorithms are commonly used. The leaky‑bucket discards excess traffic when the incoming rate exceeds the outflow rate.

For bursty traffic, the token‑bucket is more suitable.

Google Guava provides a convenient RateLimiter based on the token‑bucket algorithm.

public static void main(String[] args) {    String start = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());    RateLimiter limiter = RateLimiter.create(1.0); // 1 permit per second    for (int i = 1; i <= 10; i++) {        double waitTime = limiter.acquire(i);        System.out.println("cutTime=" + System.currentTimeMillis() + " call execute:" + i + " waitTime:" + waitTime);    }    String end = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());    System.out.println("start time:" + start);    System.out.println("end time:" + end);}

RateLimiter supports two modes: SmoothBursty (constant token generation, smooth burst handling) and SmoothWarmingUp (gradual ramp‑up of token rate).

SmoothBursty Mode

RateLimiter limiter = RateLimiter.create(5);

creates a bucket with capacity 5 and adds 5 tokens per second (one token every 200 ms). Calls to acquire() consume tokens; if none are available, the thread waits.

SmoothWarmingUp Mode

RateLimiter limiter = RateLimiter.create(5, 1000, TimeUnit.MILLISECONDS);

warms up over 1 second before reaching the steady rate.

Custom Annotation + AOP for RateLimiter (Single‑Node)

import java.lang.annotation.*;@Inherited@Documented@Target({ElementType.METHOD, ElementType.FIELD, ElementType.TYPE})@Retention(RetentionPolicy.RUNTIME)public @interface RateLimitAspect {}

import com.google.common.util.concurrent.RateLimiter;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.aspectj.lang.annotation.Pointcut;import org.springframework.stereotype.Component;@Component@Aspectpublic class RateLimitAop {    private RateLimiter rateLimiter = RateLimiter.create(5.0);    @Pointcut("@annotation(com.test.cn.springbootdemo.aspect.RateLimitAspect)")    public void serviceLimit() {}    @Around("serviceLimit()")    public Object around(ProceedingJoinPoint joinPoint) {        if (rateLimiter.tryAcquire()) {            return joinPoint.proceed();        } else {            // return failure response }    }}

import com.test.cn.springbootdemo.aspect.RateLimitAspect;import org.springframework.stereotype.Controller;import org.springframework.web.bind.annotation.RequestMapping;import org.springframework.web.bind.annotation.ResponseBody;@Controllerpublic class TestController {    @ResponseBody    @RateLimitAspect    @RequestMapping("/test")    public String test(){        return "success";    }}

3. Controlling Requests per Time Window

Limit the number of calls per second/minute/day. Example limiting to 50 QPS:

private LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder().expireAfterWrite(2, TimeUnit.SECONDS).build(new CacheLoader<Long, AtomicLong>(){    @Override    public AtomicLong load(Long seconds) {        return new AtomicLong(0);    }});public static long permit = 50;public ResponseEntity getData() throws ExecutionException {    long currentSeconds = System.currentTimeMillis() / 1000;    if (counter.get(currentSeconds).incrementAndGet() > permit) {        return ResponseEntity.builder().code(404).msg("Rate too high").build();    }    // business logic }

Application‑level limits work only within a single instance; for global limits we need distributed solutions.

Distributed Rate Limiting

Combine a custom annotation, interceptor, and Redis to enforce global limits.

@Inherited@Documented@Target({ElementType.FIELD,ElementType.TYPE,ElementType.METHOD})@Retention(RetentionPolicy.RUNTIME)public @interface AccessLimit {    int limit() default 5;    int sec() default 5;}

public class AccessLimitInterceptor implements HandlerInterceptor {    @Autowired    private RedisTemplate<String, Integer> redisTemplate;    @Override    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {        if (handler instanceof HandlerMethod) {            Method method = ((HandlerMethod) handler).getMethod();            AccessLimit limit = method.getAnnotation(AccessLimit.class);            if (limit == null) return true;            String key = IPUtil.getIpAddr(request) + request.getRequestURI();            Integer count = redisTemplate.opsForValue().get(key);            if (count == null) {                redisTemplate.opsForValue().set(key, 1, limit.sec(), TimeUnit.SECONDS);            } else if (count < limit.limit()) {                redisTemplate.opsForValue().set(key, count + 1, limit.sec(), TimeUnit.SECONDS);            } else {                response.setContentType("application/json;charset=UTF-8");                response.getOutputStream().write("Request too frequent!".getBytes("UTF-8"));                return false;            }        }        return true;    }}

@Controller@RequestMapping("/activity")public class AopController {    @ResponseBody    @RequestMapping("/seckill")    @AccessLimit(limit = 4, sec = 10)    public String test(HttpServletRequest request){        return "hello world!";    }}

When the same IP exceeds the limit within the defined window, further requests are blocked.

Ingress‑Level Rate Limiting (Nginx)

Use Nginx limit_req and limit_conn modules (leaky‑bucket algorithm) to restrict request rates and concurrent connections based on client IP.

limit_req_zone $binary_remote_addr zone=one:10m rate=20r/s;    limit_conn_zone $binary_remote_addr zone=addr:10m;    server {        limit_req zone=one burst=5;        limit_conn addr 30;    }

Example limiting connections for a specific location:

http {    limit_conn_zone $binary_remote_addr zone=addr:10m;    server {        location /download/ {            limit_conn addr 1;        } }

These configurations help protect services from traffic spikes at the network edge.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Caching rate limiting degradation

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.