How a Faulty Lazy-Loading Design Caused Thread‑Pool Exhaustion and How to Fix It
A production incident where a poorly implemented lazy‑loading mechanism for KMSClient caused repeated initialization, blocking threads, exhausting the shared thread pool, and triggering RejectedExecutionException alerts, was investigated step‑by‑step, leading to a concrete code fix, improved monitoring, and better thread‑pool isolation.
Scenario Description
Time : 08:24 on a certain morning
Symptom : Massive RejectedExecutionException alerts on node 172.xx.84.113
Recent Operations : No code changes, but service pods were restarted due to resource eviction
Root Cause Summary
Investigation revealed that the problem originated from the lazy‑loading implementation of KMSClient in EncryptUtil. The code attempted to use a StringBuffer flag ( is_init) together with synchronized to ensure a single initialization, but the design was flawed under concurrency.
Problem Code
@Slf4j
public class EncryptUtil {
// ① lazy‑load flag
private static final StringBuffer is_init = new StringBuffer("");
/** data encryption */
public static String encrypt(String plaintext) {
try {
if (StringUtils.isEmpty(plaintext)) {
return null;
}
// ② check lazy‑load flag
if (!"1".equals(is_init.toString())) { // 1 initialization entry
init();
}
return Util.encryptDataForVersion(plaintext, "logic_sharding", "v1");
} catch (Exception e) {
log.error("数据加密失败", e);
return null;
}
}
/** data decryption */
public static String decrypt(String cipherText) { ... }
/** client initialization */
private static void init() {
// ③ synchronized block
synchronized (EncryptUtil.class) {
KMSClient.initSecurity(Arrays.asList("logic_sharding"));
is_init.append("1");
}
}
}The is_init flag starts as an empty string. When multiple threads call EncryptUtil.encrypt concurrently, each sees the flag empty, passes the if check, and attempts to run init(). Only one thread acquires the monitor lock; the others block. After the first thread finishes, it appends "1" to the flag, but subsequent threads still see the flag as "111" (multiple appends), causing every request to re‑enter the synchronized block and re‑initialize the heavy KMSClient (2‑3 s network call). This creates a cascade of blocked threads, quickly exhausting the shared thread pool.
Evidence
Spike in RejectedExecutionException (AbortPolicy) on a single pod.
Corresponding rise in threads in BLOCKED state, indicating monitor lock contention.
Code sections using synchronized were inspected; the culprit was the EncryptUtil lazy‑load block.
Further analysis showed that every business flow involving encryption (rule fetching, order receipt, order update, etc.) suffered the 2‑3 s delay caused by repeated KMSClient initialization.
Timeline showed the issue started when a new pod was launched after host eviction; early requests triggered the concurrent initialization.
Proof of Concept
High concurrency tests confirmed that repeatedly initializing KMSClient quickly fills the thread queue, reproducing the production outage.
Improvements
Code Fix
Replace the StringBuffer flag with an AtomicBoolean and add double‑checked locking.
@Slf4j
public class EncryptUtil {
/** initialization flag */
private static final AtomicBoolean is_init = new AtomicBoolean(false);
/** data encryption */
public static String encrypt(String plaintext) {
try {
if (StringUtils.isEmpty(plaintext)) {
return null;
}
// initialize client if not done yet
init();
return AESUtil.encryptDataForVersion(plaintext, "logic_sharding", "v1");
} catch (Exception e) {
log.error("数据加密失败", e);
return null;
}
}
/** data decryption */
public static String decrypt(String cipherText) { ... }
/** client initialization */
private static void init() {
if (is_init.get()) { return; }
synchronized (EncryptUtil.class) {
if (is_init.get()) { return; }
log.info("加解密客户端初始化 begin");
KMSClient.initSecurity(Arrays.asList("logic_sharding"));
log.info("加解密客户端初始化 end");
is_init.set(true);
}
}
}Monitoring & Alerts
Enable dynamic thread‑pool alerts and configure alert recipients to catch queue‑full situations early.
Incident Response Process
Three‑stage approach: fast detection, precise定位, and stable recovery. Use fine‑grained monitoring, phone/instant‑messaging alerts, and a dedicated on‑call rotation.
Thread‑Pool Isolation
Separate thread pools per business scenario (add, modify, cancel, fulfil) to prevent a single scenario from exhausting the global pool.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
