Mastering Distributed Rate Limiting: Caching, Degradation, and Flow Control Techniques
This article explains how caching, degradation, and various rate‑limiting strategies—including semaphore‑based concurrency control, token‑bucket algorithms, Guava RateLimiter, custom annotations, Redis interceptors, and Nginx modules—protect high‑concurrency distributed systems, with practical Java code samples and configuration snippets.
Three Essential Tools for High‑Concurrency Systems
When building distributed high‑concurrency systems, three tools protect the system: cache, degradation, and rate limiting.
Cache
Cache aims to improve system access speed and increase processing capacity.
Degradation
Degradation temporarily disables services when problems affect core processes; the services are re‑enabled after the peak period or once the issue is resolved.
Rate Limiting
Rate limiting protects the system by throttling concurrent requests or limiting the number of requests within a time window; once the limit is reached, the system can reject, queue, or degrade requests.
Problem Scenario
One day, a sudden ten‑fold traffic surge made an interface almost unusable, causing a cascade failure that crashed the whole system. Like an electrical fuse that breaks under overload, an interface needs a “fuse” to prevent unexpected request spikes from overwhelming the system.
Related Concepts
PV
Page View – total number of page accesses; each refresh counts as one.
UV
Unique View – counts a client IP once per day.
QPS
Queries per second – a key indicator of system load; exceeding a preset threshold may require scaling.
RT
Response Time – the time taken to respond to each request, directly affecting user experience.
Application‑Level Rate Limiting
1. Controlling Concurrency
Use a semaphore to limit the number of concurrent accesses. Example in Java:
public class DubboService { private final Semaphore permit = new Semaphore(10, true); public void process(){ try{ permit.acquire(); // business logic } catch (InterruptedException e) { e.printStackTrace(); } finally { permit.release(); } }}The semaphore allows only ten threads to execute concurrently, even if more threads are running.
2. Controlling Access Rate
Token‑bucket and leaky‑bucket algorithms are commonly used. The leaky‑bucket discards excess traffic when the incoming rate exceeds the outflow rate.
For bursty traffic, the token‑bucket is more suitable.
Google Guava provides a convenient RateLimiter based on the token‑bucket algorithm.
public static void main(String[] args) { String start = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date()); RateLimiter limiter = RateLimiter.create(1.0); // 1 permit per second for (int i = 1; i <= 10; i++) { double waitTime = limiter.acquire(i); System.out.println("cutTime=" + System.currentTimeMillis() + " call execute:" + i + " waitTime:" + waitTime); } String end = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date()); System.out.println("start time:" + start); System.out.println("end time:" + end);}RateLimiter supports two modes: SmoothBursty (constant token generation, smooth burst handling) and SmoothWarmingUp (gradual ramp‑up of token rate).
SmoothBursty Mode
RateLimiter limiter = RateLimiter.create(5);creates a bucket with capacity 5 and adds 5 tokens per second (one token every 200 ms). Calls to acquire() consume tokens; if none are available, the thread waits.
SmoothWarmingUp Mode
RateLimiter limiter = RateLimiter.create(5, 1000, TimeUnit.MILLISECONDS);warms up over 1 second before reaching the steady rate.
Custom Annotation + AOP for RateLimiter (Single‑Node)
import java.lang.annotation.*;@Inherited@Documented@Target({ElementType.METHOD, ElementType.FIELD, ElementType.TYPE})@Retention(RetentionPolicy.RUNTIME)public @interface RateLimitAspect {} import com.google.common.util.concurrent.RateLimiter;import org.aspectj.lang.ProceedingJoinPoint;import org.aspectj.lang.annotation.Around;import org.aspectj.lang.annotation.Aspect;import org.aspectj.lang.annotation.Pointcut;import org.springframework.stereotype.Component;@Component@Aspectpublic class RateLimitAop { private RateLimiter rateLimiter = RateLimiter.create(5.0); @Pointcut("@annotation(com.test.cn.springbootdemo.aspect.RateLimitAspect)") public void serviceLimit() {} @Around("serviceLimit()") public Object around(ProceedingJoinPoint joinPoint) { if (rateLimiter.tryAcquire()) { return joinPoint.proceed(); } else { // return failure response } }} import com.test.cn.springbootdemo.aspect.RateLimitAspect;import org.springframework.stereotype.Controller;import org.springframework.web.bind.annotation.RequestMapping;import org.springframework.web.bind.annotation.ResponseBody;@Controllerpublic class TestController { @ResponseBody @RateLimitAspect @RequestMapping("/test") public String test(){ return "success"; }}3. Controlling Requests per Time Window
Limit the number of calls per second/minute/day. Example limiting to 50 QPS:
private LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder().expireAfterWrite(2, TimeUnit.SECONDS).build(new CacheLoader<Long, AtomicLong>(){ @Override public AtomicLong load(Long seconds) { return new AtomicLong(0); }});public static long permit = 50;public ResponseEntity getData() throws ExecutionException { long currentSeconds = System.currentTimeMillis() / 1000; if (counter.get(currentSeconds).incrementAndGet() > permit) { return ResponseEntity.builder().code(404).msg("Rate too high").build(); } // business logic }Application‑level limits work only within a single instance; for global limits we need distributed solutions.
Distributed Rate Limiting
Combine a custom annotation, interceptor, and Redis to enforce global limits.
@Inherited@Documented@Target({ElementType.FIELD,ElementType.TYPE,ElementType.METHOD})@Retention(RetentionPolicy.RUNTIME)public @interface AccessLimit { int limit() default 5; int sec() default 5;} public class AccessLimitInterceptor implements HandlerInterceptor { @Autowired private RedisTemplate<String, Integer> redisTemplate; @Override public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception { if (handler instanceof HandlerMethod) { Method method = ((HandlerMethod) handler).getMethod(); AccessLimit limit = method.getAnnotation(AccessLimit.class); if (limit == null) return true; String key = IPUtil.getIpAddr(request) + request.getRequestURI(); Integer count = redisTemplate.opsForValue().get(key); if (count == null) { redisTemplate.opsForValue().set(key, 1, limit.sec(), TimeUnit.SECONDS); } else if (count < limit.limit()) { redisTemplate.opsForValue().set(key, count + 1, limit.sec(), TimeUnit.SECONDS); } else { response.setContentType("application/json;charset=UTF-8"); response.getOutputStream().write("Request too frequent!".getBytes("UTF-8")); return false; } } return true; }} @Controller@RequestMapping("/activity")public class AopController { @ResponseBody @RequestMapping("/seckill") @AccessLimit(limit = 4, sec = 10) public String test(HttpServletRequest request){ return "hello world!"; }}When the same IP exceeds the limit within the defined window, further requests are blocked.
Ingress‑Level Rate Limiting (Nginx)
Use Nginx limit_req and limit_conn modules (leaky‑bucket algorithm) to restrict request rates and concurrent connections based on client IP.
limit_req_zone $binary_remote_addr zone=one:10m rate=20r/s; limit_conn_zone $binary_remote_addr zone=addr:10m; server { limit_req zone=one burst=5; limit_conn addr 30; }Example limiting connections for a specific location:
http { limit_conn_zone $binary_remote_addr zone=addr:10m; server { location /download/ { limit_conn addr 1; } }These configurations help protect services from traffic spikes at the network edge.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
