Why Most Rate‑Limiting Fails and How to Quickly Add Precise Single‑Node QPS Limiting via AOP & Guava
The article explains why common rate‑limiting approaches often break under high traffic, compares four mainstream algorithms, and provides a step‑by‑step guide to implement a lightweight, non‑intrusive single‑instance QPS limiter using Spring AOP and Guava's token‑bucket RateLimiter.
Problem Overview
Most traffic‑related failures—interface avalanches, crawler attacks, and sudden spikes—originate from rate‑limiting solutions that are non‑standard, imprecise, or inelegant.
Hand‑written counters suffer from critical breakage, inaccurate concurrency, and thread‑unsafe behavior.
Global filters apply a single threshold to all interfaces, preventing differentiated control.
Simple fixed‑window counters double traffic at window boundaries, causing instant avalanches.
Leaky‑bucket algorithms discard bursts and waste system capacity.
Hard‑coded limits in business code are invasive, hard to maintain, and cannot be adjusted dynamically.
Poor implementations ignore cache cleanup, leading to memory leaks and service crashes under crawler load.
1. Comparison of Four Main Limiting Algorithms
1.1 Fixed‑Window Counter
Principle: Count requests within a fixed time slot and reset on timeout.
Critical flaw: Traffic doubles at the window edge, instantly breaking services.
Applicability: Almost never used in production.
1.2 Sliding‑Window Counter
Principle: Split time into smaller windows and slide them to improve precision.
Drawback: Still cannot handle sudden bursts; traffic can still saturate the thread pool.
1.3 Leaky Bucket
Principle: Drain requests at a constant rate, discarding overflow.
Drawback: Rejects bursts and yields low throughput because idle capacity is unused.
1.4 Token‑Bucket (Guava implementation)
Principle: Generate tokens at a steady rate; a request proceeds only when it acquires a token, with a maximum bucket capacity.
Allows bursts: idle time accumulates tokens, which can be consumed instantly.
Smooth limiting: no traffic doubling at window boundaries.
Cold‑start warm‑up: token generation starts slowly and speeds up, preventing start‑up avalanches.
Non‑blocking fast‑fail: threads are not blocked, ensuring stability under high concurrency.
Conclusion: Token‑bucket is the preferred single‑node limiting algorithm.
2. Guava RateLimiter Mechanics
2.1 SmoothBursty (core burst handling)
In normal mode, RateLimiter does not use fixed time slices; it records the next permissible release time. When the system is idle, tokens accumulate, allowing a sudden burst to consume the stored tokens.
2.2 SmoothWarmingUp (cold‑start warm‑up)
During service start‑up, JIT and connection pools are not ready; immediate full QPS can crash the service. Warm‑up mode issues tokens slowly at first, gradually increasing speed until the configured QPS is reached, fully solving the start‑up traffic avalanche.
2.3 Performance Characteristics
Lock‑free design with CAS for lightweight contention.
Pure in‑memory operations, no I/O or Redis network overhead. tryAcquire is non‑blocking and does not occupy business threads.
Supports stable operation above 100k+ QPS on a single machine.
3. Dependencies and Common Components
3.1 Maven Dependencies
<!-- Web -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- AOP -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<!-- Guava RateLimiter core -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>32.1.3-jre</version>
</dependency>
<!-- Actuator for hot‑refresh -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>3.2 Unified Response (429 standard for limiting)
import lombok.Data;
@Data
public class Result<T> {
private Integer code;
private String msg;
private T data;
public static <T> Result<T> success(T data) {
Result<T> r = new Result<>();
r.setCode(200);
r.setMsg("success");
r.setData(data);
return r;
}
// limit response 429
public static <T> Result<T> limit(String msg) {
Result<T> r = new Result<>();
r.setCode(429);
r.setMsg(msg);
return r;
}
public static <T> Result<T> error(String msg) {
Result<T> r = new Result<>();
r.setCode(500);
r.setMsg(msg);
return r;
}
}3.3 Limiting Dimension Enum
public enum LimitType {
ALL, // global interface dimension
IP, // IP dimension for anti‑scraping
USER, // User‑ID dimension for anti‑duplicate actions
API_IP, // Interface + IP combination
API_USER // Interface + User combination
}3.4 Limiting Annotation
import java.lang.annotation.*;
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface RateLimit {
/** QPS per second */
double qps() default 10;
/** Warm‑up time (seconds) */
int warmUp() default 0;
/** Message returned when limited */
String msg() default "请求过于频繁,请稍后再试";
/** Limiting dimension */
LimitType limitType() default LimitType.ALL;
/** Silent mode: discard without error */
boolean silent() default false;
}3.5 Utility for IP and User Context
import jakarta.servlet.http.HttpServletRequest;
public class HttpUtil {
public static String getIp(HttpServletRequest request) {
String ip = request.getHeader("X-Forwarded-For");
if (ip != null && !ip.isEmpty() && !"unknown".equalsIgnoreCase(ip)) {
return ip.split(",")[0].trim();
}
ip = request.getHeader("X-Real-IP");
if (ip != null && !ip.isEmpty() && !"unknown".equalsIgnoreCase(ip)) {
return ip.trim();
}
return request.getRemoteAddr();
}
// placeholder for token‑based user ID extraction
public static String getUserId(HttpServletRequest request) {
String token = request.getHeader("token");
return token == null ? "anonymous" : "login_user";
}
}4. Rate‑Limit Aspect Implementation
import com.google.common.util.concurrent.RateLimiter;
import jakarta.servlet.http.HttpServletRequest;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.aspectj.lang.annotation.Pointcut;
import org.aspectj.lang.reflect.MethodSignature;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import org.springframework.web.context.request.RequestContextHolder;
import org.springframework.web.context.request.ServletRequestAttributes;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.TimeUnit;
@Slf4j
@Aspect
@Component
@RequiredArgsConstructor
public class RateLimitAspect {
private final RateLimitProperties rateLimitProperties;
// cache of limiters
private static final Map<String, RateLimiter> LIMITER_MAP = new ConcurrentHashMap<>();
@Pointcut("@annotation(com.demo.annotation.RateLimit)")
public void pointCut() {}
@Around("pointCut()")
public Object around(ProceedingJoinPoint pjp) throws Throwable {
// global switch off → pass through
if (!rateLimitProperties.getEnable()) {
return pjp.proceed();
}
ServletRequestAttributes attributes = (ServletRequestAttributes) RequestContextHolder.getRequestAttributes();
HttpServletRequest request = attributes.getRequest();
String ip = HttpUtil.getIp(request);
// whitelist pass‑through
if (isWhiteIp(ip)) {
return pjp.proceed();
}
MethodSignature signature = (MethodSignature) pjp.getSignature();
RateLimit rateLimit = signature.getMethod().getAnnotation(RateLimit.class);
// generate unique key based on dimension
String limitKey = generateLimitKey(request, rateLimit);
// obtain or create limiter
RateLimiter limiter = LIMITER_MAP.computeIfAbsent(limitKey, k -> {
if (rateLimit.warmUp() > 0) {
return RateLimiter.create(rateLimit.qps(), rateLimit.warmUp(), TimeUnit.SECONDS);
}
return RateLimiter.create(rateLimit.qps());
});
// try to acquire token non‑blocking
boolean acquire = limiter.tryAcquire();
if (!acquire) {
if (rateLimit.silent()) {
return null; // silent discard
}
log.warn("【接口限流】url:{},ip:{},qps:{}", request.getRequestURI(), ip, rateLimit.qps());
return Result.limit(rateLimit.msg());
}
return pjp.proceed();
}
private String generateLimitKey(HttpServletRequest request, RateLimit rateLimit) {
String uri = request.getRequestURI();
String ip = HttpUtil.getIp(request);
String userId = HttpUtil.getUserId(request);
return switch (rateLimit.limitType()) {
case ALL -> uri;
case IP -> ip;
case USER -> userId;
case API_IP -> uri + ":" + ip;
case API_USER -> uri + ":" + userId;
};
}
private boolean isWhiteIp(String ip) {
return rateLimitProperties.getWhiteIp().stream()
.anyMatch(w -> ip.startsWith(w.replace("*", "")));
}
// periodic cleanup to avoid memory leak
@Scheduled(fixedRateString = "${rate-limit.clean-interval}000")
public void clearIdleLimiter() {
LIMITER_MAP.clear();
log.info("【限流组件】定时清理闲置限流器完成");
}
}4.1 AOP Configuration
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.EnableAspectJAutoProxy;
import org.springframework.scheduling.annotation.EnableScheduling;
@Configuration
@EnableScheduling
@EnableAspectJAutoProxy(proxyTargetClass = true, exposeProxy = true)
public class AopConfig {}5. Dynamic Configuration (YAML) and Hot Refresh
# Single‑node rate‑limit configuration
rate-limit:
# global switch
enable: true
# idle limiter cleanup interval (seconds)
clean-interval: 300
# whitelist IPs (supports wildcard '*')
white-ip: 127.0.0.1,192.168.*
# default QPS when annotation does not specify
default-qps: 105.1 Configuration Binding Class
import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.cloud.context.config.annotation.RefreshScope;
import org.springframework.stereotype.Component;
import java.util.List;
@Data
@Component
@RefreshScope
@ConfigurationProperties(prefix = "rate-limit")
public class RateLimitProperties {
private Boolean enable;
private Long cleanInterval;
private List<String> whiteIp;
private Double defaultQps;
}6. Usage Examples (Annotation‑Based Rate Limiting)
Global interface limiting (shared QPS) – suitable for admin panels, internal APIs, low‑concurrency queries:
@RateLimit(qps = 20, limitType = LimitType.ALL, msg = "接口访问频繁,请稍后重试")Per‑IP limiting (anti‑scraping) – for public endpoints without login:
@RateLimit(qps = 5, limitType = LimitType.IP, msg = "当前IP访问过于频繁")Per‑User limiting (prevent duplicate actions) – for order, lottery, payment, form submission:
@RateLimit(qps = 2, limitType = LimitType.USER, msg = "操作过于频繁,请稍后再试")Interface + IP precise limiting – when multiple interfaces share the same IP but need separate control:
@RateLimit(qps = 8, limitType = LimitType.API_IP)Cold‑start warm‑up limiting – for homepage, activity pages, hot‑spot interfaces:
@RateLimit(qps = 30, warmUp = 5, limitType = LimitType.ALL, msg = "系统繁忙,请稍后")Silent limiting (no response to client) – for logging, heartbeat, or telemetry endpoints where the client should not see a limit error:
@RateLimit(qps = 15, silent = true)These examples demonstrate how a single annotation can provide fine‑grained, dynamic, and non‑intrusive rate control across various dimensions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Tech Workshop
Focused on Java backend technologies, sharing fundamentals, multithreading, JVM, the Spring ecosystem, microservices, distributed systems, high concurrency, source‑code analysis, and practical experience. Continuously delivers high‑quality original content, interview guides, and learning roadmaps to help Java developers progress from beginner to advanced, enhancing technical skills and core competitiveness.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
