Designing High‑Concurrency Architecture: Principles, Idempotency, Rate Limiting and a Token‑Bucket Demo
The article explains how to design a backend architecture that can handle millions of concurrent requests by applying principles such as service decomposition, high availability, idempotent business logic, and various rate‑limiting algorithms—including sliding window, leaky bucket and token bucket—with a runnable Java demo.
With the rapid growth of the Internet, software services face ever‑increasing user traffic; when traffic reaches tens of thousands of requests per second, smooth operation and instant response become critical, much like the traffic surge on Taobao during Double‑11.
Principles for Architecture Design
1. Achieve High Concurrency
• Service splitting: divide the whole project into multiple sub‑projects or modules for horizontal scaling. • Service‑oriented architecture: address service registration and discovery after complex service calls. • Message queue: decouple components and enable asynchronous processing. • Caching: use various caches to improve concurrent performance.
2. Achieve High Availability
Use clustering, traffic limiting (rate limiting), and degradation strategies to keep the system resilient.
3. Business Design
• Idempotency: ensure that multiple identical requests produce the same result, preventing side effects such as double charging in payment scenarios. • Anti‑duplicate submission: generate a unique token (e.g., CSRF token) stored in the user’s session and embed it in a hidden form field; the server validates the token on submission and rejects repeated or missing tokens.
Typical server‑side rejection cases:
Token in session does not match token submitted with the form.
Session does not contain a token.
Form submission lacks a token.
State Machine
In software design, a finite‑state machine (FSM) models a limited set of states and the transitions between them.
Rate Limiting Purpose
Rate limiting protects system availability by throttling concurrent requests or limiting the number of requests within a time window; excess requests are rejected with a “server busy, please try later” message.
Rate Limiting Methods
Limit instantaneous concurrency (e.g., Nginx limit_conn per IP).
Limit total concurrency via database connection pools or thread pools.
Limit average rate within a time window at the API layer.
Other limits: remote API call rate, MQ consumption rate.
Common Rate‑Limiting Algorithms
1. Sliding Window Protocol – improves throughput by allowing multiple packets to be sent before waiting for acknowledgments.
2. Leaky Bucket – forces a fixed transmission rate; excess requests overflow and can be dropped or queued.
3. Token Bucket – suitable for bursty traffic; tokens are added to a bucket at a constant rate, and a request proceeds only if a token is available.
Example configuration: Rate = 2 tokens per second, bucket size = 100.
Below is a small Java demo that implements a token‑bucket rate limiter using Guava’s RateLimiter and demonstrates success and failure cases.
public class TokenDemo {
// qps: queries per second; tps: transactions per second
// Here qps is set to 10
RateLimiter rateLimiter = RateLimiter.create(10);
public void doSomething(){
if (rateLimiter.tryAcquire()){
// Token acquired successfully
System.out.println("正常处理");
}else{
System.out.println("处理失败");
}
}
public static void main(String args[]) throws IOException{
/*
* CountDownLatch uses a counter to make threads wait until the count reaches zero.
*/
CountDownLatch latch = new CountDownLatch(1);
Random random = new Random(10);
TokenDemo tokenDemo = new TokenDemo();
for (int i=0;i<20;i++){
new Thread(() -> {
try {
latch.await();
Thread.sleep(random.nextInt(1000));
tokenDemo.doSomething();
} catch (InterruptedException e){
e.printStackTrace();
}
}).start();
}
latch.countDown();
System.in.read();
}
}Execution result (sample):
正常处理 正常处理 正常处理 正常处理 正常处理 处理失败 正常处理 处理失败 处理失败 处理失败 正常处理 处理失败 正常处理 处理失败 正常处理 正常处理 正常处理 正常处理 处理失败 处理失败
The output shows that when tokens are exhausted, requests are rejected, achieving rate limiting.
4. Counter – the simplest method, limiting the number of requests within a defined time interval.
Source: http://www.cnblogs.com/GodHeng/p/8834810.html
Copyright Statement : Content is sourced from the web; rights belong to the original authors. We will remove it if any infringement is reported.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
