Why Your Hystrix Semaphore Limits Fail: Hidden JDK GC Bug and Correct Rate Calculations
This article explains how miscalculating Hystrix semaphore quotas, combined with a JDK GCLocker‑initiated GC bug, can cause unexpected request rejections, and provides a proper method for computing concurrency limits and alternative buffering strategies such as Java semaphores and thread pools.
Problem
We previously used Hystrix semaphore‑based rate limiting, setting quotas based on peak QPS (1000) and average response time (15 ms). The calculation assumed 1000 ms/15 ms ≈ 66 requests per second per semaphore, so 15 semaphores seemed sufficient for 1000 QPS.
However, error logs showed HystrixRuntimeException ... REJECTED_SEMAPHORE_EXECUTION even after increasing the semaphore count to 50, indicating an unresolved issue.
Investigation Steps
Check average request latency (≈ 17 ms) to rule out blocking.
Inspect Hystrix code to ensure semaphores are released; a leak would gradually exhaust permits.
Write a small demo to reproduce the issue; the problem only appears during early service startup before initialization completes.
JDK Bug Discovery
GC logs revealed two rapid YGC events separated by 0.000187 s, indicating a long‑lasting GC pause (~160 ms). The logs marked the pause as GCLocker Initiated GC , which occurs when JNI code manipulates objects and the JVM blocks threads to prevent pointer shifts.
The pause was caused by JDK bug JDK-8048556 , a known issue in the JDK version we were using.
Correct Rate‑Limiting Calculation
The proper formula is:
Concurrency (permits) / average request time (s) > QPS (requests/s)
Considering a 160 ms GC pause, the effective service time per second drops to 840 ms, so the original semaphore count was still theoretically sufficient, but the pause caused request bursts that exceeded available permits.
Buffering Strategies
Two main approaches can handle bursts during GC:
Use java.util.concurrent.Semaphore with tryAcquire() (non‑blocking) or acquire() (blocking) to control concurrency more precisely.
Employ a thread‑pool with a larger maximumPoolSize or a BlockingQueue, optionally adding a rejectHandler for overflow.
Thread pools offer more flexibility but introduce context‑switch overhead and slower scaling during spikes.
Conclusion
We resolved a long‑standing hidden issue by recognizing the impact of GC pauses on semaphore‑based rate limiting, adjusting the semaphore count to at least 95, and understanding that fixing the JDK bug alone would not eliminate the need for proper concurrency sizing.
@Override
public boolean tryAcquire() {
int currentCount = count.incrementAndGet();
if (currentCount > numberOfPermits.get()) {
count.decrementAndGet();
return false;
} else {
return true;
}
}Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
