How to Implement Multi‑Dimensional Bandwidth Throttling in Spring Boot 3
This guide explains how to build a complete multi‑dimensional network bandwidth throttling solution in Spring Boot 3 using a custom token‑bucket algorithm, HandlerInterceptor, HttpServletResponseWrapper, and RateLimitedOutputStream to precisely control download, video streaming, and API traffic.
Overview
The article presents a full solution for implementing multi‑dimensional network bandwidth throttling in Spring Boot 3. It manually implements the core token‑bucket logic and uses a custom HandlerInterceptor to intercept requests, a HttpServletResponseWrapper to wrap the response stream, and a RateLimitedOutputStream to control output speed for scenarios such as file download and video streaming.
Why Bandwidth Throttling?
File download services : limit free users to 200 KB/s and VIP users to 2 MB/s to protect experience and encourage upgrades.
Video streaming : assign different bandwidth caps per resolution (e.g., 480P → 500 KB/s, 1080P → 3 MB/s) to avoid high‑bitrate streams monopolizing server bandwidth.
API protection : large‑payload endpoints (e.g., report export) can consume the entire outbound bandwidth if not limited, affecting other users.
Token Bucket Algorithm
The token‑bucket algorithm is a classic traffic‑shaping technique. A bucket receives tokens at a fixed refill rate; each data transmission must consume a corresponding number of tokens.
Key Parameters
Capacity : maximum burst size. A 200 KB capacity allows at most 200 KB of data to be sent continuously before waiting for new tokens.
Refill Rate : long‑term average speed. Refilling 200 KB per second yields an average throughput of 200 KB/s.
Chunk Size : size of each write operation. Splitting 8 KB into four 2 KB writes with token checks produces smoother traffic.
Algorithm Flow
Before sending data: Calculate time elapsed since the last refill. Compute new tokens as elapsedTime × refillRate and add them to the bucket (capped by capacity). Update the token count. During data transmission: Check if enough tokens are available. If sufficient, deduct tokens and send data immediately. If insufficient, calculate wait time as (missingTokens / refillRate) , sleep precisely, then consume tokens.
Technical Design
Overall Flow
请求流程:
┌───────────────────────────────────────┐
│ 1. DispatcherServlet 分发请求 │
└───────────────────────────────────────┘
↓
┌───────────────────────────────────────┐
│ 2. BandwidthLimitInterceptor.preHandle()│
│ - 解析 @BandwidthLimit 注解 │
│ - 从 BandwidthLimitManager 获取 TokenBucket │
│ - 创建 BandwidthLimitResponseWrapper 并存入 request attribute │
└───────────────────────────────────────┘
↓
┌───────────────────────────────────────┐
│ 3. Controller 处理请求 │
│ - 通过 BandwidthLimitHelper.getLimitedResponse() 获取包装后的响应 │
│ - 向响应流写入数据(自动触发限速) │
└───────────────────────────────────────┘
↓
┌───────────────────────────────────────┐
│ 4. BandwidthLimitInterceptor.afterCompletion() │
│ - 清理资源,关闭流 │
└───────────────────────────────────────┘Why HandlerInterceptor?
Spring Boot offers two common ways to intercept requests: Filter and HandlerInterceptor. The solution chooses HandlerInterceptor because annotation parsing requires access to the HandlerMethod object, which is only available after the dispatcher has resolved the handler. Filters run before handler resolution and cannot read method‑level annotations such as @BandwidthLimit.
Core Components
@BandwidthLimit : declarative annotation that configures throttling parameters.
BandwidthLimitInterceptor : intercepts the request, parses the annotation, and creates the response wrapper.
BandwidthLimitManager : manages shared token buckets for global, API, user, and IP dimensions.
BandwidthLimitResponseWrapper : extends HttpServletResponseWrapper and overrides getOutputStream() to return a custom throttling stream.
RateLimitedOutputStream : implements the throttling logic by delegating to a TokenBucket.
TokenBucket : concrete implementation of the token‑bucket algorithm.
BandwidthLimitHelper : utility that retrieves the wrapped response from request attributes for controller use.
Multi‑Dimensional Throttling
Global (GLOBAL)
All requests share a single bucket, useful for protecting the overall server egress bandwidth. Example: a 10 MB/s global limit ensures the total outbound traffic never exceeds 10 MB/s even with 100 concurrent downloads.
@BandwidthLimit(value = 200, unit = BandwidthUnit.KB, type = LimitType.GLOBAL)
@GetMapping("/download/global")
public void downloadGlobal(HttpServletResponse response) throws IOException {
HttpServletResponse limitedResponse = BandwidthLimitHelper.getLimitedResponse(request, response);
// write data ...
}API Dimension (API)
Each endpoint has an independent bucket, so traffic on one API does not affect another.
@BandwidthLimit(value = 500, unit = BandwidthUnit.KB, type = LimitType.API)
@GetMapping("/download/file")
public void downloadFile(HttpServletResponse response) throws IOException {
// file download logic
}
@BandwidthLimit(value = 2048, unit = BandwidthUnit.KB, type = LimitType.API)
@GetMapping("/stream/video")
public void streamVideo(HttpServletResponse response) throws IOException {
// video streaming logic
}User Dimension (USER)
Limits are applied per user identifier (e.g., request header X-User-Id). The free and vip parameters enable differentiated service levels.
@BandwidthLimit(value = 200, unit = BandwidthUnit.KB, type = LimitType.USER, free = 200, vip = 2048)
@GetMapping("/download/user")
public void downloadByUser(@RequestHeader("X-User-Type") String userType, HttpServletResponse response) throws IOException {
// automatically applies 200KB/s or 2MB/s based on user type
}IP Dimension (IP)
Limits are applied per client IP address, protecting against a single IP monopolizing bandwidth. Supports proxy headers such as X-Forwarded-For and X-Real-IP.
@BandwidthLimit(value = 300, unit = BandwidthUnit.KB, type = LimitType.IP)
@GetMapping("/download/ip")
public void downloadByIp(HttpServletResponse response) throws IOException {
// each IP limited to 300KB/s
}Key Code Implementations
1. Token Bucket Core Algorithm
The bucket uses System.nanoTime() for nanosecond precision and performs token refill, wait‑time calculation, and precise sleeping.
public synchronized void acquire(long permits) {
// 1. refill tokens
refill();
// 2. calculate wait time
if (tokens >= permits) {
tokens -= permits;
return;
}
long deficit = permits - tokens;
long waitNanos = (deficit * 1_000_000_000L) / refillRate;
// 3. precise wait
sleepNanos(waitNanos);
// 4. consume after wait
tokens = 0;
}
private void refill() {
long now = System.nanoTime();
long elapsedNanos = now - lastRefillTime;
long newTokens = (elapsedNanos * refillRate) / 1_000_000_000L;
tokens = Math.min(capacity, tokens + newTokens);
lastRefillTime = now;
}2. Response Wrapper
public class BandwidthLimitResponseWrapper extends HttpServletResponseWrapper {
private final TokenBucket sharedTokenBucket;
private RateLimitedOutputStream limitedOutputStream;
private final long bandwidthBytesPerSecond;
private final int chunkSize;
@Override
public ServletOutputStream getOutputStream() throws IOException {
if (limitedOutputStream == null && sharedTokenBucket != null) {
limitedOutputStream = new RateLimitedOutputStream(
super.getOutputStream(),
sharedTokenBucket,
bandwidthBytesPerSecond,
chunkSize);
}
return limitedOutputStream;
}
}3. Interceptor Creating the Wrapper
@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) {
BandwidthLimit annotation = findAnnotation(handler);
if (annotation != null) {
TokenBucket bucket = limitManager.getBucket(
annotation.type(), annotation.key(), annotation.capacity(), annotation.rate());
BandwidthLimitResponseWrapper wrapped = new BandwidthLimitResponseWrapper(
response, bucket, annotation.value(), annotation.chunkSize());
request.setAttribute("BandwidthLimitWrappedResponse", wrapped);
}
return true;
}4. Controller Using the Wrapped Response
@GetMapping("/download/global")
public void downloadGlobal(HttpServletRequest request, HttpServletResponse response) throws IOException {
HttpServletResponse limited = BandwidthLimitHelper.getLimitedResponse(request, response);
limited.setContentType("application/octet-stream");
limited.setHeader("Content-Disposition", "attachment; filename=test.bin");
// write data – throttling applied automatically
limited.getOutputStream().write(data);
}Parameter Tuning Guide
Bucket Capacity Selection
Capacity determines burst handling capability:
Rate × 0.5 – smooth traffic, no burst.
Rate × 1.0 – allows a 1‑second burst (default recommendation).
Rate × 2.0 – allows a 2‑second burst for better first‑screen load.
Chunk Size Selection
Chunk size influences smoothness; a practical formula is chunkSize = bandwidth / 50:
200 KB/s → 1‑4 KB chunks (small chunks ensure smoothness).
1 MB/s → 4‑8 KB chunks (balance smoothness and performance).
5 MB/s+ → 8‑16 KB chunks (reduce system‑call overhead).
// automatic calculation (recommended)
@BandwidthLimit(value = 200, unit = BandwidthUnit.KB, chunkSize = -1)
// manual specification
@BandwidthLimit(value = 200, unit = BandwidthUnit.KB, chunkSize = 4096)Conclusion
The article demonstrates a Spring Boot implementation of multi‑dimensional bandwidth throttling based on the token‑bucket algorithm, leveraging HandlerInterceptor and HttpServletResponseWrapper. It supports global, API, user, and IP dimensions, provides real‑time statistics, and is suitable for protecting APIs, file downloads, and video streams.
Source code repository: https://github.com/yuboon/java-examples/tree/master/springboot-netspeed-limit
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
