Designing High‑Concurrency Architecture: Principles, Idempotency, Rate Limiting and a Token‑Bucket Demo

The article explains how to design a backend architecture that can handle millions of concurrent requests by applying principles such as service decomposition, high availability, idempotent business logic, and various rate‑limiting algorithms—including sliding window, leaky bucket and token bucket—with a runnable Java demo.

Architecture Digest
Architecture Digest
Architecture Digest
Designing High‑Concurrency Architecture: Principles, Idempotency, Rate Limiting and a Token‑Bucket Demo

With the rapid growth of the Internet, software services face ever‑increasing user traffic; when traffic reaches tens of thousands of requests per second, smooth operation and instant response become critical, much like the traffic surge on Taobao during Double‑11.

Principles for Architecture Design

1. Achieve High Concurrency

• Service splitting: divide the whole project into multiple sub‑projects or modules for horizontal scaling. • Service‑oriented architecture: address service registration and discovery after complex service calls. • Message queue: decouple components and enable asynchronous processing. • Caching: use various caches to improve concurrent performance.

2. Achieve High Availability

Use clustering, traffic limiting (rate limiting), and degradation strategies to keep the system resilient.

3. Business Design

• Idempotency: ensure that multiple identical requests produce the same result, preventing side effects such as double charging in payment scenarios. • Anti‑duplicate submission: generate a unique token (e.g., CSRF token) stored in the user’s session and embed it in a hidden form field; the server validates the token on submission and rejects repeated or missing tokens.

Typical server‑side rejection cases:

Token in session does not match token submitted with the form.

Session does not contain a token.

Form submission lacks a token.

State Machine

In software design, a finite‑state machine (FSM) models a limited set of states and the transitions between them.

Rate Limiting Purpose

Rate limiting protects system availability by throttling concurrent requests or limiting the number of requests within a time window; excess requests are rejected with a “server busy, please try later” message.

Rate Limiting Methods

Limit instantaneous concurrency (e.g., Nginx limit_conn per IP).

Limit total concurrency via database connection pools or thread pools.

Limit average rate within a time window at the API layer.

Other limits: remote API call rate, MQ consumption rate.

Common Rate‑Limiting Algorithms

1. Sliding Window Protocol – improves throughput by allowing multiple packets to be sent before waiting for acknowledgments.

2. Leaky Bucket – forces a fixed transmission rate; excess requests overflow and can be dropped or queued.

3. Token Bucket – suitable for bursty traffic; tokens are added to a bucket at a constant rate, and a request proceeds only if a token is available.

Example configuration: Rate = 2 tokens per second, bucket size = 100.

Below is a small Java demo that implements a token‑bucket rate limiter using Guava’s RateLimiter and demonstrates success and failure cases.

public class TokenDemo {
    // qps: queries per second; tps: transactions per second
    // Here qps is set to 10
    RateLimiter rateLimiter = RateLimiter.create(10);

    public void doSomething(){
        if (rateLimiter.tryAcquire()){
            // Token acquired successfully
            System.out.println("正常处理");
        }else{
            System.out.println("处理失败");
        }
    }

    public static void main(String args[]) throws IOException{
        /*
        * CountDownLatch uses a counter to make threads wait until the count reaches zero.
        */
        CountDownLatch latch = new CountDownLatch(1);
        Random random = new Random(10);
        TokenDemo tokenDemo = new TokenDemo();
        for (int i=0;i<20;i++){
            new Thread(() -> {
                try {
                    latch.await();
                    Thread.sleep(random.nextInt(1000));
                    tokenDemo.doSomething();
                } catch (InterruptedException e){
                    e.printStackTrace();
                }
            }).start();
        }
        latch.countDown();
        System.in.read();
    }
}

Execution result (sample):

正常处理 正常处理 正常处理 正常处理 正常处理 处理失败 正常处理 处理失败 处理失败 处理失败 正常处理 处理失败 正常处理 处理失败 正常处理 正常处理 正常处理 正常处理 处理失败 处理失败

The output shows that when tokens are exhausted, requests are rejected, achieving rate limiting.

4. Counter – the simplest method, limiting the number of requests within a defined time interval.

Source: http://www.cnblogs.com/GodHeng/p/8834810.html

Copyright Statement : Content is sourced from the web; rights belong to the original authors. We will remove it if any infringement is reported.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architecturehigh concurrencyIdempotencyrate limitingToken Bucket
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.