Master Rate Limiting: Algorithms, Guava RateLimiter, and Java Implementation
This article introduces the background and necessity of rate limiting, explains the leaky bucket and token bucket algorithms with visual diagrams, and provides a comprehensive Java implementation using Guava's RateLimiter, custom annotations, AOP interception, and controller integration to protect high‑traffic applications.
This article starts with the background of rate limiting, introduces common methods, code implementation, and source analysis of rate‑limiting components. It is the first in a series, covering the background, algorithms, and Guava RateLimiter implementation; the second part will discuss its source code.
1. Rate Limiting Background
Rate limiting is an important tool for protecting systems by restricting the number of concurrent accesses or the request rate within a time window, preventing large or burst traffic from causing service crashes. When the limit is reached, requests can be rejected or traffic can be shaped.
In practice, rate limiting can be applied at the network layer, the access layer (e.g., Nginx), or the application layer. This article focuses on application‑level rate limiting; the principles for other layers are similar, differing only in the technical means used.
Typical high‑traffic scenarios such as flash sales, seckill systems, and large e‑commerce platforms may limit thread pools, database connection pools, concurrent counts, API call rates, or MQ consumption rates, based on network connections, bandwidth, CPU or memory load, etc.
2. Rate Limiting Algorithms
The two most common basic algorithms are the leaky bucket and the token bucket.
The diagram resembles a funnel: incoming water represents request traffic, and outgoing water represents the system processing requests. When traffic exceeds capacity, water accumulates and may overflow.
2.1 Leaky Bucket Algorithm
The leaky bucket consists of a queue and a processor. Requests are placed in the queue if it is not full; the processor removes requests at a fixed rate. If the request volume exceeds the maximum limit, new requests are discarded.
2.2 Token Bucket Algorithm
The token bucket is similar in structure but adds tokens at a fixed rate. A request is processed only if a token is available; otherwise it is rejected or buffered.
(1) Tokens are added to the bucket at a constant rate up to a maximum capacity; excess tokens are discarded. (2) When a request arrives, it consumes a token if available; if not, the request is dropped or placed in a buffer.
2.3 Comparison of Token Bucket and Leaky Bucket
• The token bucket adds tokens at a fixed rate; processing depends on token availability. When tokens run out, new requests are rejected. The leaky bucket drains requests at a constant rate regardless of arrival rate; excess requests are rejected when the bucket is full.
• The token bucket limits the average inflow rate but allows bursts as long as tokens are available, supporting multiple token consumption per request. The leaky bucket enforces a constant outflow rate, smoothing burst traffic.
• The token bucket permits a certain degree of burst traffic, whereas the leaky bucket provides a smooth outflow.
3. Rate Limiting Implementation
Guava's RateLimiter provides a token‑bucket implementation with two strategies: SmoothBursty for bursty traffic and SmoothWarmingUp for warm‑up behavior. The following example demonstrates how to use RateLimiter in a Spring Boot application.
3.1 Maven Dependency
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>28.1-jre</version>
</dependency>3.2 Custom Annotation
@Target({ElementType.PARAMETER, ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface RequestRateLimitAnnotation {
/** Fixed number of tokens */
double limitNum();
/** Token acquisition timeout */
long timeout();
/** Time unit, default milliseconds */
TimeUnit timeUnit() default TimeUnit.MILLISECONDS;
/** Error message when token cannot be obtained */
String errMsg() default "请求太频繁!";
}3.3 AOP Interceptor
@Slf4j
@Aspect
@Component
public class RequestRateLimitAspect {
/** Store RateLimiter per URL to avoid recreating */
private Map<String, RateLimiter> limitMap = Maps.newConcurrentMap();
@Pointcut("@annotation(com.itfly8.test.annotation.RequestRateLimitAnnotation)")
public void pointCut() {}
@Around("pointCut()")
public Object doAround(ProceedingJoinPoint joinPoint) throws Throwable {
HttpServletRequest request = ((ServletRequestAttributes) RequestContextHolder.getRequestAttributes()).getRequest();
String reqUrl = request.getRequestURI();
RequestRateLimitAnnotation rateLimiter = this.getRequestRateLimiter(joinPoint);
if (Objects.nonNull(rateLimiter)) {
RateLimiter limiter = getRateLimiter(reqUrl, rateLimiter);
boolean acquire = limiter.tryAcquire(rateLimiter.timeout(), rateLimiter.timeUnit());
if (!acquire) {
return new Response<>("200", reqUrl.concat(rateLimiter.errMsg()));
}
}
return joinPoint.proceed();
}
private RateLimiter getRateLimiter(String reqUrl, RequestRateLimitAnnotation rateLimiter) {
RateLimiter limiter = limitMap.get(reqUrl);
if (Objects.isNull(limiter)) {
synchronized (this) {
limiter = limitMap.get(reqUrl);
if (Objects.isNull(limiter)) {
limiter = RateLimiter.create(rateLimiter.limitNum());
limitMap.put(reqUrl, limiter);
log.info("RequestRateLimitAspect请求{},创建令牌桶,容量{} 成功", reqUrl, rateLimiter.limitNum());
}
}
}
return limiter;
}
private RequestRateLimitAnnotation getRequestRateLimiter(final JoinPoint joinPoint) {
Method[] methods = joinPoint.getTarget().getClass().getDeclaredMethods();
String name = joinPoint.getSignature().getName();
if (!StringUtils.isEmpty(name)) {
for (Method method : methods) {
RequestRateLimitAnnotation annotation = method.getAnnotation(RequestRateLimitAnnotation.class);
if (annotation != null && name.equals(method.getName())) {
return annotation;
}
}
}
return null;
}
}3.4 Controller Usage
@RestController
@RequestMapping("/test/")
public class ExampleController {
@RequestRateLimitAnnotation(limitNum = 2, timeout = 10)
@PostMapping("/ratelimit")
public String testRateLimit() {
/** Test code */
return "success";
}
}3.5 Execution Result
If the request exceeds the limit, a log entry and a response similar to the following are produced:
{"code":"200","msg":"/test/ratelimit请求太频繁!","data":{}}4. Summary
This article covered rate‑limiting scenarios, algorithms, and the use of Guava's RateLimiter, providing a relatively generic implementation that can be extended to build a comprehensive rate‑limiting mechanism for high‑concurrency applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
