Backend Development 31 min read

Comprehensive Guide to Implementing Rate Limiting in Microservices Using Guava, Sentinel, Redis, and a Custom Spring Boot Starter

This article provides an in‑depth tutorial on designing and implementing various rate‑limiting strategies—such as token bucket, leaky bucket, and sliding window—in Java microservice architectures, with practical code examples using Guava, Sentinel, Redis+Lua, and a reusable Spring Boot starter.

Top Architect
Top Architect
Top Architect
Comprehensive Guide to Implementing Rate Limiting in Microservices Using Guava, Sentinel, Redis, and a Custom Spring Boot Starter

1. Background

Rate limiting is crucial in a micro‑service system because a single overloaded service can become a hidden avalanche factor that blocks, queues, or times out subsequent requests, eventually exhausting JVM resources.

2. Rate‑Limiting Overview

When choosing a technology stack (Dubbo, Spring Cloud, Spring Boot, etc.) the rate‑limiting solution must be selected according to the specific architecture and business requirements.

2.1 Dubbo Service Governance Mode

Dubbo uses Netty under the hood, which can be advantageous over HTTP in certain scenarios. Rate‑limiting options include:

Client‑side: semaphore limiting, connection‑count limiting (socket‑>TCP)

Server‑side: thread‑pool limiting, semaphore limiting, receive‑count limiting (socket‑>TCP)

2.1.2 Thread‑Pool Settings

Dubbo supports four thread‑pool types. The <dubbo:protocol> tag can configure core size, maximum size, and queue length to achieve basic throttling.

2.1.3 Integrating Third‑Party Components

For Spring Boot projects you can directly introduce libraries such as Hystrix, Guava, or Sentinel SDKs, or even develop a custom solution.

2.2 Spring Cloud Service Governance Mode

Spring Cloud and Spring Cloud Alibaba already provide built‑in rate‑limiting components that work out‑of‑the‑box.

2.2.1 Hystrix

Hystrix, an open‑source Netflix library, offers request‑level throttling, circuit breaking, and fallback capabilities.

2.2.2 Sentinel

Sentinel is a flow‑control component in the Spring Cloud Alibaba ecosystem that supports rate limiting, traffic shaping, circuit breaking, system load protection, and hotspot protection.

2.3 Gateway‑Level Rate Limiting

When many services need protection, a gateway can filter malicious requests, crawlers, or attacks, providing a global safeguard for the whole system.

3. Common Rate‑Limiting Algorithms

3.1 Token Bucket Algorithm

The token bucket is the most widely used algorithm. Tokens are generated at a fixed rate and stored in a bucket; a request proceeds only when it can acquire a token.

3.2 Leaky Bucket Algorithm

Similar to the token bucket, but the bucket holds request packets instead of tokens. When the bucket is full, new requests are dropped.

3.3 Sliding Window Algorithm

A time window slides forward continuously. The window is divided into small slots (e.g., 1 second each) and each slot counts the number of requests. The sum of all slots determines whether the request exceeds the limit.

4. Practical Implementations

4.1 Guava‑Based Rate Limiting

Add the Guava dependency and create a custom annotation:

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface RateConfigAnno {
    String limitType();
    double limitCount() default 5d;
}

Implement an AOP class that obtains a RateLimiter from a helper, tries to acquire a token, and returns a JSON error when throttled.

@Aspect
@Component
public class GuavaLimitAop {
    private static final Logger logger = LoggerFactory.getLogger(GuavaLimitAop.class);

    @Before("execution(@RateConfigAnno * *(..))")
    public void limit(JoinPoint joinPoint) {
        Method currentMethod = getCurrentMethod(joinPoint);
        if (currentMethod == null) return;
        String limitType = currentMethod.getAnnotation(RateConfigAnno.class).limitType();
        double limitCount = currentMethod.getAnnotation(RateConfigAnno.class).limitCount();
        RateLimiter rateLimiter = RateLimitHelper.getRateLimiter(limitType, limitCount);
        if (!rateLimiter.tryAcquire()) {
            HttpServletResponse resp = ((ServletRequestAttributes) RequestContextHolder.getRequestAttributes()).getResponse();
            JSONObject json = new JSONObject();
            json.put("success", false);
            json.put("msg", "Rate limited");
            output(resp, json.toJSONString());
        }
    }
    // ...output method and helper methods omitted for brevity
}

4.2 Sentinel‑Based Rate Limiting

Add the Sentinel core dependency and define a custom annotation:

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface SentinelLimitAnnotation {
    String resourceName();
    int limitCount() default 50;
}

The AOP class registers a flow rule and uses SphU.entry to protect the resource:

@Aspect
@Component
public class SentinelMethodLimitAop {
    @Pointcut("@annotation(com.congge.sentinel.SentinelLimitAnnotation)")
    public void rateLimit() {}

    @Around("rateLimit()")
    public Object around(ProceedingJoinPoint joinPoint) throws Throwable {
        Method method = ((MethodSignature) joinPoint.getSignature()).getMethod();
        SentinelLimitAnnotation anno = method.getAnnotation(SentinelLimitAnnotation.class);
        initFlowRule(anno.resourceName(), anno.limitCount());
        Entry entry = null;
        try {
            entry = SphU.entry(anno.resourceName());
            return joinPoint.proceed();
        } catch (BlockException e) {
            return "Blocked by Sentinel";
        } finally {
            if (entry != null) entry.exit();
        }
    }
    // initFlowRule implementation omitted
}

4.3 Redis + Lua Rate Limiting

Redis provides atomic operations and high throughput. The approach consists of a Lua script, a custom annotation, a Redis configuration bean, and an AOP interceptor that executes the script.

local key = "rate.limit:" .. KEYS[1]
local limit = tonumber(ARGV[1])
local current = tonumber(redis.call('get', key) or "0")
if current + 1 > limit then
  return 0
else
  redis.call('INCRBY', key, "1")
  redis.call('expire', key, "2")
  return current + 1
end

The interceptor builds a unique key from IP, class name, method name and annotation key, runs the script, and proceeds only when the returned count is within the limit.

5. Building a Custom Spring Boot Starter

To avoid duplicating rate‑limiting code across many micro‑services, the article shows how to package the annotations, AOP classes, and spring.factories entry into a reusable starter JAR.

# src/main/resources/META-INF/spring.factories
org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
  com.congge.aop.SemaphoreLimiterAop,\
  com.congge.aop.GuavaLimiterAop,\
  com.congge.aop.SentinelLimiterAop

After publishing the JAR, other services can simply add the dependency and annotate methods with @TokenBucketLimiter , @ShLimiter , or @SentinelLimiter to obtain out‑of‑the‑box throttling.

6. Conclusion

The article demonstrates several production‑ready rate‑limiting techniques, explains their underlying algorithms, and provides a complete starter‑based solution that can be shared across micro‑service projects, greatly simplifying traffic control and system stability.

microservicesRedisSpring BootSentinelGuavaRate LimitingLuaStarter
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.