Information Security 15 min read

Defending Against Million‑QPS Attacks: Rate Limiting, Fingerprinting & Real‑Time Rules

This article explains how to protect systems from massive malicious traffic reaching millions of queries per second by combining gateway rate limiting, distributed circuit breaking, device fingerprinting, behavior analysis, dynamic rule engines, and real‑time risk scoring, illustrated with Nginx‑Lua, Sentinel, Drools, and Flink examples.

IT Services Circle

Jun 13, 2025

Defending Against Million‑QPS Attacks: Rate Limiting, Fingerprinting & Real‑Time Rules

Preface

Today we discuss the ultimate challenge that keeps many developers awake: when malicious traffic floods like a tsunami, how to protect your system?

Some have experienced API throttling nightmares, but attacks at the scale of millions of QPS are a different battle.

This article explores how to defend against API traffic reaching millions of QPS.

Why is a million QPS so deadly?

Illustrated below is the impact of a million QPS attack:

Attackers use three core weapons:

IP Ocean Tactics : 100k+ proxy IP pool rotating dynamically, rendering traditional IP rate limiting ineffective.

Device Cloning : forging browser fingerprints to mimic real devices.

Protocol‑Level Precise Attacks : crafted HTTP requests that bypass basic WAF rules.

The chain reaction that can crash a system includes:

Thread pool 100% occupied → new requests timeout.

Database connections exhausted → SQL execution blocked.

Redis response surge → cache penetration avalanche.

Microservice circuit‑breaker cascade → services unavailable.

First Defense Line: Basic Rate Limiting and Circuit Breaking

1. Gateway Rate Limiting

Implement rate limiting at the gateway, typically using Nginx + Lua.

Example Nginx configuration:

location /api/payment {
    access_by_lua_block {
        local limiter = require "resty.limit.req"
        -- token bucket: 1000 QPS + 2000 burst
        local lim, err = limiter.new("payment_limit", 1000, 2000)
        if not lim then
            ngx.log(ngx.ERR, "Failed to init limiter: ", err)
            return ngx.exit(500)
        end

        -- limit by client IP
        local key = ngx.var.remote_addr
        local delay, err = lim:incoming(key, true)

        if not delay then
            if err == "rejected" then
                ngx.header.content_type = "application/json"
                ngx.status = 429
                ngx.say([[{"code":429,"msg":"Too many requests"}]])
                return ngx.exit(429)
            end
            ngx.log(ngx.ERR, "Rate limit error: ", err)
            return ngx.exit(500)
        end
    }
}

Code explanation:

Use lua-resty-limit-req module from OpenResty.

Token bucket algorithm: 1000 QPS normal traffic + 2000 burst capacity.

Rate limit per client IP.

Exceeding limit returns HTTP 429 with JSON error.

2. Distributed Circuit Breaking

For high traffic, add a distributed circuit‑breaker such as a Sentinel cluster.

Sentinel cluster flow control configuration example:

public class SentinelConfig {
    @PostConstruct
    public void initFlowRules() {
        // create cluster flow rule
        ClusterFlowRule rule = new ClusterFlowRule();
        rule.setResource("createOrder"); // protected resource
        rule.setGrade(RuleConstant.FLOW_GRADE_QPS); // QPS limit
        rule.setCount(50000); // 50k QPS cluster threshold
        rule.setClusterMode(true); // enable cluster mode
        rule.setClusterConfig(new ClusterRuleConfig()
            .setFlowId(123) // global unique ID
            .setThresholdType(1) // global threshold
        );
        // load rule
        ClusterFlowRuleManager.loadRules(Collections.singletonList(rule));
    }
}

Flow diagram:

Implementation principle:

Token server centrally manages cluster traffic quota.

Gateway nodes request tokens from the token server in real time.

When total QPS exceeds the threshold, each node’s traffic is proportionally limited.

Avoids imbalance caused by single‑node rate limiting.

Second Defense Line: Device Fingerprinting and Behavior Analysis

1. Browser Fingerprint Generation

Frontend can generate a fingerprint in the browser; even if the IP changes, the same device yields the same fingerprint.

Implementation using Canvas and WebGL:

// Frontend device fingerprint generation
function generateDeviceFingerprint() {
    // 1. Collect basic device info
    const baseInfo = [
        navigator.userAgent,
        navigator.platform,
        screen.width + 'x' + screen.height,
        navigator.language
    ].join('|');

    // 2. Generate Canvas fingerprint
    const canvas = document.createElement('canvas');
    const ctx = canvas.getContext('2d');
    ctx.fillStyle = '#f60';
    ctx.fillRect(0, 0, 100, 30);
    ctx.fillStyle = '#069';
    ctx.font = '16px Arial';
    ctx.fillText('Defense is art', 10, 20);
    const canvasData = canvas.toDataURL();

    // 3. Generate WebGL fingerprint
    const gl = canvas.getContext('webgl');
    const debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
    const renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);

    // 4. Combine into final fingerprint
    const fingerprint = md5(baseInfo + canvasData + renderer);
    return fingerprint;
}

Fingerprint characteristics:

Stability : >98% consistency on the same device.

Uniqueness : <0.1% collision across different devices.

Stealth : Transparent to users, hard to clear.

2. Behavior Analysis Model

Analyze user behavior such as mouse movements.

Example Python model:

import numpy as np

def analyze_mouse_behavior(move_events):
    """
    Analyze mouse movement features.
    :param move_events: list of {'x':..., 'y':..., 't':...}
    :return: anomaly probability (0-1)
    """
    # 1. Compute speed sequence
    speeds = []
    for i in range(1, len(move_events)):
        prev = move_events[i-1]
        curr = move_events[i]
        dx = curr['x'] - prev['x']
        dy = curr['y'] - prev['y']
        distance = (dx**2 + dy**2) ** 0.5
        time_diff = curr['t'] - prev['t']
        speed = distance / max(0.001, time_diff)
        speeds.append(speed)

    # 2. Compute acceleration changes
    accelerations = [speeds[i] - speeds[i-1] for i in range(1, len(speeds))]

    # 3. Extract key features
    features = {
        'speed_mean': np.mean(speeds),
        'speed_std': np.std(speeds),
        'acc_max': max(accelerations),
        'acc_std': np.std(accelerations),
        'linearity': calc_linearity(move_events)
    }

    # 4. Predict with pretrained model
    return risk_model.predict([features])

Behavior feature dimensions:

Movement Speed : bots have constant speed, humans vary.

Acceleration : bots show saw‑tooth acceleration patterns.

Trajectory Linearity : bots tend to move in straight lines.

Operation Interval : bots have highly consistent intervals.

Third Defense Line: Dynamic Rule Engine

1. Real‑Time Rule Configuration

Use a dynamic rule engine such as Drools to define risk rules.

Drools rule example for high‑frequency access to a sensitive API:

rule "High Frequency Coupon Acquisition"
    salience 100
    no-loop true
when
    $req : Request(
        path == "/api/coupon/acquire",
        $uid : userId != null,
        $ip : clientIp
    )
    // count requests from same user within 10 seconds
    accumulate(
        Request(
            userId == $uid,
            path == "/api/coupon/acquire",
            this != $req,
            $ts : timestamp
        );
        $count : count($ts),
        $minTime : min($ts),
        $maxTime : max($ts)
    )
    eval($count > 30 && ($maxTime - $minTime) < 10000)
then
    insert(new BlockEvent($uid, $ip, "High Frequency Coupon"));
    $req.setBlock(true);
end

Rule engine advantages:

Real‑time effect : new rules push within seconds.

Complex conditions : supports multi‑dimensional joint judgments.

Dynamic updates : no service restart required.

2. Multi‑Dimensional Correlation Analysis Model

Illustrated below is a risk scoring mechanism that combines IP risk, device risk, behavior anomaly, and historical profile.

Scoring formula:

RiskScore = 
    IPWeight * IPScore +
    DeviceWeight * DeviceScore +
    BehaviorWeight * AnomalyDegree +
    HistoryWeight * HistoricalRisk

Ultimate Defense Architecture

Summary diagram of the million‑QPS defense architecture:

Core component breakdown:

Traffic Scrubbing Layer (CDN)

Filters static resource requests.

Absorbs >70% of traffic spikes.

Security Layer (Gateway Cluster)

Device fingerprinting for each request.

Distributed rate limiting at cluster level.

Rule engine for real‑time risk judgment.

Real‑Time Risk Layer (Flink)

// Flink real‑time risk processing
riskStream
    .keyBy(req -> req.getDeviceId()) // group by device
    .timeWindow(Time.seconds(10))    // 10‑second sliding window
    .aggregate(new RiskAggregator)   // aggregate risk metrics
    .map(riskData -> {
        val score = riskModel.predict(riskData);
        if (score > RISK_THRESHOLD) {
            // block high‑risk request
            blockRequest(riskData.getRequestId());
        }
    })

Data Support Layer

Redis stores real‑time risk profiles.

Flink computes behavior feature metrics.

Rule management console for dynamic strategy adjustments.

Hard‑Learned Lessons

1. The Trap of IP Whitelists

Scenario: adding partner IPs to a whitelist.

Disaster: attackers compromise partner servers and launch attacks.

Solution: validate requests with device fingerprinting and behavior analysis.

2. Static Rate‑Limit Threshold Pitfalls

Scenario: fixed 5,000 QPS limit.

Problem: legitimate traffic during promotions exceeds limit and gets blocked.

Optimization: dynamically adjust thresholds based on historical traffic.

// Dynamic threshold adjustment algorithm
public class DynamicThreshold {
    // Adjust based on last week’s same‑time traffic
    public static int calculateThreshold(String api) {
        // 1. Get historical QPS
        double base = getHistoricalQps(api);
        // 2. Apply today’s growth factor
        double growth = getGrowthFactor();
        // 3. Keep 20% safety margin
        return (int)(base * growth * 0.8);
    }
}

3. Ignoring Bandwidth Costs

Disaster: 10 Gbps attack caused monthly budget to exceed by 200%.

Countermeasures:

Front‑end CDN to absorb static traffic.

Enable cloud provider DDoS protection services.

Configure bandwidth auto‑circuit‑breaker.

True defense is not about preventing attacks entirely, but making the attacker pay far more than they gain; when your defense cost is lower than the attack cost, the battle ends.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

rate limiting device fingerprinting DDoS mitigation Backend Security distributed circuit breaking real-time risk scoring

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.