How a Low Guava RateLimiter Triggered RPC Thread‑Pool Exhaustion

A developer investigates why an RPC service suddenly rejected many requests, discovers that a Guava RateLimiter with a limit of only ten calls per minute caused thread‑pool saturation, and explains the misinterpretations, monitoring data, and steps taken to pinpoint the root cause.

ITPUB
ITPUB
ITPUB
How a Low Guava RateLimiter Triggered RPC Thread‑Pool Exhaustion

Fault Background

After noticing a surge of RPC request errors in the production environment, the team saw that the errors were caused by thread‑pool rejections. Although the affected service was non‑critical, the volume of failures prompted a deeper investigation.

Initial Questions

What was the exact increase in request volume?

What were the thread‑pool's core size, maximum size, and queue length?

What rejection policy was configured?

Which interfaces showed latency spikes?

What were the CPU and GC metrics?

Was the request surge the sole reason for the pool being full?

Investigation Process

2.1 Request Volume

Single‑machine RPC QPS rose from 300 /s to 450 /s.

Kafka message QPS remained stable at 50 /s.

No other entry points or scheduled tasks were identified.

The modest 150 /s increase seemed insufficient to saturate the pool, suggesting another root cause.

2.2 RPC Thread‑Pool Configuration

Only one RPC port (8001) was affected. Its thread‑pool settings were:

Core threads: 10

Maximum threads: 1024 (excessively high)

Queue length: 0

Rejection policy: immediate exception

Between 20:11 and 20:13, threads grew linearly from 10 to 1024.

2.3 Reasoning About Thread Count

Handling 450 QPS with 1024 threads implies each request would need to take ~100 ms or less, which is unlikely for this service.

2.4 Interface Latency Spike

Average latency jumped from 5.7 ms to 17 000 ms.

The team mistakenly blamed the increase on RPC queueing, overlooking two key facts:

The thread‑pool queue length is zero, so rejected requests never wait in a queue.

Server‑side latency monitoring excludes connection and queue times; it only measures actual processing time inside the RPC thread pool.

The real issue was severe degradation in the processing stage, causing throughput to drop and the pool to expand until it hit its limit.

2.5 Other Latency Checks

CPU load was low (≈15% busy).

GC was healthy (no Full GC, occasional young GC).

Downstream RPC, SQL, Redis, and other external calls showed no noticeable latency changes.

2.6 Code Review

The problematic interface is a large, aggregated SPI defined by an upstream BCP verification system. Its implementation contains many branches and deep nesting, making the code hard to read.

2.7 Trace Analysis

Trace logs highlighted a one‑second pause between two SQL calls, yet SQL execution time remained fast, indicating the stall occurred elsewhere.

2.8 Searching for Blocking Causes

Without external calls, possible blockers include synchronization primitives. A search for "synchronized" yielded nothing, but a Guava RateLimiter was found in a class field:

private static final RateLimiter RATE_LIMITER = RateLimiter.create(10, 20, TimeUnit.SECONDS);

The limiter allowed only ten calls per minute, causing threads to block when the concurrency exceeded this threshold. The blocked threads forced the pool to create more threads until it reached its maximum, after which further requests were rejected.

Conclusion

The Guava RateLimiter’s overly low threshold (10 calls per minute) was the root cause of the RPC thread‑pool exhaustion. When concurrent requests surpassed this limit, threads were blocked, the pool expanded to its maximum size, and subsequent requests were rejected.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendthread poolRate Limiter
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.