Why Full GC Spikes During Sales Events: Uncovering DBCP Connection‑Pool Pitfalls
During a high‑traffic promotion, an interface experienced frequent timeouts and Full GC pauses over 500 ms, which were traced to a misbehaving DBCP connection pool that failed to keep connections alive, causing massive old‑generation garbage and severe GC latency.
Introduction
This article is part of the "Online Issue Handling Cases" series, which uses real‑world incidents to teach readers how to discover, locate, and resolve problems. It walks through a case where unusually long Full GC times were observed and ultimately linked to a database connection‑pool keep‑alive issue.
1. Problem Description
During a major sales promotion, an API saw a surge in timeout occurrences and Full GC pauses exceeding 500 ms.
2. Application Basics
Container: 8C12G
JVM options:
-XX:+UseConcMarkSweepGC -Xms6144m -Xmx6144m -Xmn2048m -XX:ParallelGCThreads=8 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+ParallelRefProcEnabledDatabase: MySQL
Connection pool: DBCP
3. Investigation Process
Long GC times indicated many garbage objects in memory.
Suspected memory leak, but heap dump after Full GC showed normal reclamation, so leak was ruled out.
Exported heap dumps before and after Full GC and compared "retained size"; discovered that many database‑related objects were reclaimed after Full GC.
Analyzed the heap dump with OQL and found that many connections exceeded the maxActive limit, confirming numerous stale connections.
Concluded that stale connections entering the old generation caused the prolonged Full GC.
Adjusted timeBetweenEvictionRunsMillis from 1 minute to 10 seconds, but the issue persisted.
Reviewed DBCP source code: the GenericObjectPool.Evictor task evicts idle connections based on minEvictableIdleTimeMillis. If testWhileIdle is enabled, validationQuery runs, but the connection’s idle time is not reset, causing idle connections to be evicted during low‑traffic periods while still holding heavy objects.
Reasoned why the problem only appeared during the promotion: under normal load, GC is infrequent and stale connections are reclaimed in the young generation; during the promotion, frequent GC pushes stale connections into the old generation, dramatically increasing Full GC duration.
Identified the root cause: the connection pool lacked true keep‑alive capability, leading to massive connection churn and long Full GC pauses.
4. Solution
Switch to the G1 garbage collector.
Set minEvictableIdleTimeMillis to 0.
5. Problem Summary
The DBCP connection pool does not provide true keep‑alive; idle connections are frequently evicted and recreated. Each connection carries a large object graph via a phantom reference; when such connections survive long enough to reach the old generation, they are reclaimed by the costly "mark‑sweep" algorithm, causing prolonged Full GC and downstream timeouts.
6. Extended Knowledge
Druid also suffers from keep‑alive limitations, though newer versions offer a KeepAlive option (unverified).
Druid’s validationQuery is often bypassed because the MySQL driver implements pingInternal, so the query is not executed.
Both DBCP and Druid use a FIFO (FILO) strategy; under low load, only the front connections are repeatedly used, while the rest sit idle and are repeatedly evicted and recreated.
Phantom references require two GC cycles to be reclaimed; if they reside in the old generation, two Full GCs are needed, increasing memory pressure. Newer JVMs have optimized this to a single GC.
Similar effects arise from finalize methods.
CMS’s default MaxTenuringThreshold is 6, whereas ParallelGC and G1 default to 15.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
