Defending Against Million‑QPS Attacks: Rate Limiting, Fingerprinting, and Dynamic Rules
This article explains why a million‑QPS flood can cripple systems, outlines attackers' tactics, and presents a three‑layer defense strategy—including gateway rate limiting with Nginx + Lua, distributed circuit breaking via Sentinel, device fingerprinting, behavior analysis, and a dynamic rule engine—to protect high‑traffic services.
Introduction
When malicious traffic surges like a tsunami, how can you keep your system safe?
Why a million QPS is deadly
Attackers use three core weapons:
IP ocean tactics: rotating 100k+ proxy IPs, bypassing traditional IP throttling.
Device cloning: forged browser fingerprints to mimic real devices.
Protocol‑level precise attacks: crafted HTTP requests that evade basic WAF rules.
Resulting chain reaction can exhaust thread pools, database connections, Redis caches, and trigger cascading circuit breakers.
First defense line: Basic rate limiting and circuit breaking
1. Gateway rate limiting
Implement token‑bucket limiting at the gateway using Nginx + Lua.
location /api/payment {
access_by_lua_block {
local limiter = require "resty.limit.req"
-- token bucket: 1000 QPS + 2000 burst
local lim, err = limiter.new("payment_limit", 1000, 2000)
if not lim then
ngx.log(ngx.ERR, "failed to init limiter: ", err)
return ngx.exit(500)
end
local key = ngx.var.remote_addr
local delay, err = lim:incoming(key, true)
if not delay then
if err == "rejected" then
ngx.header.content_type = "application/json"
ngx.status = 429
ngx.say([[{"code":429,"msg":"Too many requests"}]])
return ngx.exit(429)
end
ngx.log(ngx.ERR, "limit error: ", err)
return ngx.exit(500)
end
}
}This uses the lua‑resty‑limit‑req module to enforce per‑IP limits and returns a 429 status with a JSON error when the limit is exceeded.
2. Distributed circuit breaking
Use Sentinel cluster flow control for high‑traffic scenarios.
public class SentinelConfig {
@PostConstruct
public void initFlowRules() {
// create cluster flow rule
ClusterFlowRule rule = new ClusterFlowRule();
rule.setResource("createOrder"); // protected resource
rule.setGrade(RuleConstant.FLOW_GRADE_QPS); // QPS limit
rule.setCount(50000); // 50k QPS cluster threshold
rule.setClusterMode(true); // enable cluster mode
rule.setClusterConfig(new ClusterRuleConfig()
.setFlowId(123) // global unique ID
.setThresholdType(1) // global threshold
);
ClusterFlowRuleManager.loadRules(Collections.singletonList(rule));
}
}Sentinel’s token server centrally manages quotas, and each gateway node requests tokens in real time, ensuring balanced throttling across the cluster.
Second defense line: Device fingerprint and behavior analysis
1. Browser fingerprint generation
Front‑end code creates a stable fingerprint using Canvas and WebGL, which remains consistent even if the IP changes.
// Front‑end device fingerprint generation
function generateDeviceFingerprint() {
// 1. Gather basic device info
const baseInfo = [
navigator.userAgent,
navigator.platform,
screen.width + 'x' + screen.height,
navigator.language
].join('|');
// 2. Canvas fingerprint
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
ctx.fillStyle = '#f60';
ctx.fillRect(0, 0, 100, 30);
ctx.fillStyle = '#069';
ctx.font = '16px Arial';
ctx.fillText('防御即艺术', 10, 20);
const canvasData = canvas.toDataURL();
// 3. WebGL fingerprint
const gl = canvas.getContext('webgl');
const debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
const renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
// 4. Combine into final fingerprint
const fingerprint = md5(baseInfo + canvasData + renderer);
return fingerprint;
}Fingerprint characteristics:
Stability : >98% consistency on the same device.
Uniqueness : <0.1% collision across different devices.
Stealth : Invisible to users and hard to clear.
2. Behavior analysis model
Python code extracts mouse movement features and predicts risk with a pre‑trained model.
import numpy as np
def analyze_mouse_behavior(move_events):
"""Analyze mouse movement characteristics.
:param move_events: list of {'x':..., 'y':..., 't':...}
:return: anomaly probability (0‑1)
"""
# 1. Compute speed series
speeds = []
for i in range(1, len(move_events)):
prev, curr = move_events[i-1], move_events[i]
dx = curr['x'] - prev['x']
dy = curr['y'] - prev['y']
distance = (dx**2 + dy**2) ** 0.5
time_diff = curr['t'] - prev['t']
speed = distance / max(0.001, time_diff)
speeds.append(speed)
# 2. Acceleration changes
accelerations = [speeds[i] - speeds[i-1] for i in range(1, len(speeds))]
# 3. Feature extraction
features = {
'speed_mean': np.mean(speeds),
'speed_std': np.std(speeds),
'acc_max': max(accelerations),
'acc_std': np.std(accelerations),
'linearity': calc_linearity(move_events)
}
# 4. Predict risk
return risk_model.predict([features])Key behavior dimensions:
Movement speed : Bots show constant speed, humans vary.
Acceleration : Bots produce saw‑tooth patterns.
Trajectory linearity : Bots often move in straight lines.
Operation interval : Bots have highly regular intervals.
Third defense line: Dynamic rule engine
1. Real‑time rule configuration
Drools can express fine‑grained risk rules.
rule "High‑frequency access to sensitive API"
salience 100
no-loop true
when
$req : Request(path == "/api/coupon/acquire", $uid : userId != null, $ip : clientIp)
// count requests from same user in last 10 seconds
accumulate(
Request(userId == $uid, path == "/api/coupon/acquire", this != $req, $ts : timestamp);
$count : count($ts),
$minTime : min($ts),
$maxTime : max($ts)
)
eval($count > 30 && ($maxTime - $minTime) < 10000)
then
insert(new BlockEvent($uid, $ip, "High‑frequency coupon"));
$req.setBlock(true);
endAdvantages: instant activation, multi‑dimensional conditions, dynamic updates without service restarts.
2. Multi‑dimensional correlation analysis model
Risk score combines several weighted factors.
risk = IP_weight * IP_score +
Device_weight * Device_score +
Behavior_weight * Anomaly_score +
History_weight * Historical_riskUltimate defense architecture
The architecture consists of three layers:
Traffic cleaning (CDN) – filters static requests and absorbs >70% of attack traffic.
Security layer (gateway cluster) – device fingerprinting, distributed rate limiting, rule engine.
Real‑time risk layer (Flink) – computes behavior metrics; data support (Redis, Flink, rule console).
Hard‑learned lessons
1. IP whitelist trap
Adding partner IPs to a whitelist can backfire if the partner is compromised. Use device fingerprint verification and behavior analysis instead.
2. Static threshold disaster
Fixed thresholds (e.g., 5 000 QPS) cause false‑positives during traffic spikes. Adopt dynamic thresholds based on historical traffic.
// Dynamic threshold algorithm
public class DynamicThreshold {
// Calculate threshold using last week’s traffic and growth factor
public static int calculateThreshold(String api) {
double base = getHistoricalQps(api); // last week same‑time QPS
double growth = getGrowthFactor(); // today's growth coefficient
return (int)(base * growth * 0.8); // keep 20% safety margin
}
}3. Ignoring bandwidth cost
10 Gbps attacks can double monthly budgets. Mitigate by:
Front‑end CDN to absorb static traffic.
Cloud provider DDoS protection services.
Automatic bandwidth circuit‑breaker mechanisms.
True defense makes the attacker pay many times more than the cost of the attack; when your defense cost is lower than the attacker’s, the battle ends.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
