How to Detect Hotspots with a Rate‑Limiting System Using Concurrent LRU Maps

This article explains how a rate‑limiting system can identify hot‑spot resources by using time‑window statistics, a concurrent LRU hash map, and cluster‑wide aggregation to efficiently track and evict low‑frequency keys while maintaining high throughput.

21CTO
21CTO
21CTO
How to Detect Hotspots with a Rate‑Limiting System Using Concurrent LRU Maps

1. Characteristics of Hotspots

a) Massive statistical base

Hotspots can be pre‑known (e.g., a list of flash‑sale items) or sudden and unpredictable (e.g., products being刷单 or emerging popular items). For known lists we can count directly, but for sudden hotspots the key space can be billions (e.g., product IDs on Taobao). Recording every ID is infeasible, so a data structure that keeps only the most frequently accessed IDs within a bounded size is required.

b) Time‑window dimension of hotspot statistics

Hotspot detection must consider accesses per unit time rather than total volume. For example, an item accessed 3,600 times uniformly over an hour (1 per second) may be within system capacity, whereas another accessed 60 times all in a single second is a spike and should be treated as a hotspot. The granularity of the time window is crucial for accurate detection.

c) Challenges in distributed systems

Hotspot statistics may need to be computed on a single node or across a cluster. Efficiently aggregating statistics in a cluster while keeping rate‑limit rules enforceable on each node is essential.

2. Solving the Problems

2.1 Find a suitable data structure

The structure must support fast reads/writes, high concurrency, low memory footprint, controllable size, and eviction of low‑frequency entries. Google’s ConcurrentLinkedHashMap (used in Guava) satisfies these requirements.

a) Concurrency support – Implements ConcurrentMap and follows concurrent map principles.

b) LRU ordering via a doubly‑linked list – Entries are moved to the head on access; when capacity is exceeded, the tail entry (least recently used) is evicted, guaranteeing that the head holds the most recently used entries.

c) Two views – Synchronous view for callers and an asynchronous view for the internal linked list.

d) Performance compared to ConcurrentHashMap – Slightly slower but acceptable; official benchmark data shown below.

Performance comparison chart
Performance comparison chart

Thus, using this LRU concurrent map we can retain a limited number of high‑frequency keys within each time window, discarding low‑frequency keys.

2.2 Time‑window based counting

The rate‑limiting system uses a sliding window to smooth QPS statistics. Each “cluster point” (i.e., a method to be monitored) has a 2‑second sampling window divided into 20 slots. The average of these slots yields a smoothed QPS, reducing spikes.

Sliding window diagram
Sliding window diagram

More details on exponential moving averages can be found on Wikipedia.

2.3 Handling cluster aggregation

2.3.1 The rate‑limiting system exposes a port (8719) that provides per‑node QPS statistics, acting as a “backdoor” for data collection.

2.3.2 To aggregate data across thousands of machines, a queue‑based approach is used: tasks are placed in a queue, multiple threads execute them, and a dedicated thread merges the results. In a large cluster (e.g., “buy”), the aggregated result can be returned within 2 seconds.

Cluster aggregation diagram
Cluster aggregation diagram

After obtaining the cluster‑wide summary, the information is propagated to individual machines via Diamond, with a latency of about 2–3 seconds.

3. Summary and Performance Report

The data structure occupies roughly 8 KB. Its default configuration stores a 2‑second sliding window with 20 slots, each holding up to 200 parameters, resulting in a worst‑case size of about 8 KB plus 4,000 parameters.

Throughput on a typical development machine reaches 290,000 QPS; production machines are expected to perform even better.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemsrate limitingSliding WindowBackend Performancehotspot detectionconcurrent data structures
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.