Mastering API Rate Limiting in Spring Boot: Algorithms, Guava & AOP
This tutorial explains why API rate limiting is essential for high‑traffic Spring Boot services, introduces counter, leaky‑bucket, and token‑bucket algorithms, shows how to use Guava's RateLimiter, and demonstrates a clean custom‑annotation AOP solution to decouple rate‑limiting logic from business code.
Today we discuss how to implement rate limiting for APIs in a Spring Boot project, covering common algorithms and elegant solutions.
Why rate limit?
High concurrency and traffic spikes (e.g., flash sales) can overwhelm a system. Rate limiting protects availability by throttling requests, queuing, degrading or rejecting excess traffic. The 12306 ticketing system is a typical example.
What is rate limiting? Common algorithms
Rate limiting restricts the number of requests in a time window to keep the system stable.
1. Counter based limiting
The simplest method, used to cap total concurrency such as DB connection pool size or thread pool size. An AtomicInteger can track current concurrent executions and reject when the threshold is exceeded.
2. Leaky Bucket algorithm
The bucket represents the system’s processing capacity. Requests (water) flow into the bucket; they leave at a fixed rate. If incoming rate exceeds the outflow, excess requests overflow and are rejected.
3. Token Bucket algorithm
Tokens are added to a bucket at a constant rate. A request must acquire a token before proceeding; if none are available, the request is rejected. By adjusting bucket capacity and token rate, traffic can be controlled.
Rate limiting with Guava
Guava’s RateLimiter implements the token‑bucket algorithm. Steps:
1. Add Guava dependency
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>30.1-jre</version>
</dependency>2. Apply rate‑limiting logic to an endpoint
@Slf4j
@RestController
@RequestMapping("/limit")
public class LimitController {
/** Rate limit: 2 requests per second */
private final RateLimiter limiter = RateLimiter.create(2.0);
private DateTimeFormatter dtf = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
@GetMapping("/test1")
public String testLimiter() {
// If a token is not obtained within 500 ms, trigger degradation
boolean tryAcquire = limiter.tryAcquire(500, TimeUnit.MILLISECONDS);
if (!tryAcquire) {
log.warn("Service degraded at {}", LocalDateTime.now().format(dtf));
return "Current queue is long, please try later!";
}
log.info("Token acquired at {}", LocalDateTime.now().format(dtf));
return "Request succeeded";
}
}The two core methods are create() and tryAcquire(). Detailed usage:
acquire() – blocks until a token is obtained, returns wait time.
acquire(int permits) – blocks for a specific number of tokens.
tryAcquire() – returns false immediately if no token.
tryAcquire(int permits) – same for multiple tokens.
tryAcquire(long timeout, TimeUnit unit) – waits up to the timeout.
tryAcquire(int permits, long timeout, TimeUnit unit) – combination.
3. Test the endpoint
Access http://127.0.0.1:8080/limit/test1 repeatedly and observe the logs. Only two successful requests per second are logged; the rest are degraded.
Embedding tryAcquire() directly in each controller mixes business logic with rate‑limiting code and violates DRY principles.
Optimizing with custom annotation + AOP
Define a @Limit annotation and an AOP aspect that manages a map of RateLimiter instances keyed by the annotation’s key. The aspect obtains a token according to the configured permits per second, timeout, and time unit, returning a custom error message when throttling occurs.
1. Add AOP dependency
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>2. Create the @Limit annotation
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.METHOD})
@Documented
public @interface Limit {
String key() default "";
double permitsPerSecond();
long timeout();
TimeUnit timeunit() default TimeUnit.MILLISECONDS;
String msg() default "System busy, please try later.";
}3. Implement the AOP aspect
@Slf4j
@Aspect
@Component
public class LimitAop {
private final Map<String, RateLimiter> limitMap = Maps.newConcurrentMap();
@Around("@annotation(com.jianzh5.blog.limit.Limit)")
public Object around(ProceedingJoinPoint joinPoint) throws Throwable {
MethodSignature signature = (MethodSignature) joinPoint.getSignature();
Method method = signature.getMethod();
Limit limit = method.getAnnotation(Limit.class);
if (limit != null) {
String key = limit.key();
RateLimiter rateLimiter = limitMap.computeIfAbsent(key,
k -> {
RateLimiter rl = RateLimiter.create(limit.permitsPerSecond());
log.info("Created token bucket {} with rate {}", k, limit.permitsPerSecond());
return rl;
});
boolean acquire = rateLimiter.tryAcquire(limit.timeout(), limit.timeunit());
if (!acquire) {
log.debug("Token bucket {} acquisition failed", key);
responseFail(limit.msg());
return null;
}
}
return joinPoint.proceed();
}
private void responseFail(String msg) {
HttpServletResponse response = ((ServletRequestAttributes) RequestContextHolder
.getRequestAttributes()).getResponse();
ResultData<Object> resultData = ResultData.fail(ReturnCode.LIMIT_ERROR.getCode(), msg);
WebUtils.writeJson(response, resultData);
}
}4. Apply the annotation to controllers
@Slf4j
@RestController
@RequestMapping("/limit")
public class LimitController {
@GetMapping("/test2")
@Limit(key = "limit2", permitsPerSecond = 1, timeout = 500,
timeunit = TimeUnit.MILLISECONDS,
msg = "Current queue is long, please try later!")
public String limit2() {
log.info("Token bucket limit2 acquired");
return "ok";
}
@GetMapping("/test3")
@Limit(key = "limit3", permitsPerSecond = 2, timeout = 500,
timeunit = TimeUnit.MILLISECONDS,
msg = "System busy, please try later!")
public String limit3() {
log.info("Token bucket limit3 acquired");
return "ok";
}
}5. Verify the behavior
Calling http://127.0.0.1:8080/limit/test2 shows normal JSON responses when the rate limit is not exceeded, and error JSON when throttled.
Conclusion
By measuring system capacity during load testing, appropriate rate‑limit parameters can be set to protect services from traffic spikes. This article covered three classic algorithms, demonstrated a Guava‑based implementation, and showed how custom annotations with AOP cleanly separate business logic from rate‑limiting concerns.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
