Resolving Unexpected 2‑Second Young GC Pauses in Java Rule Engine
A Java rule‑engine experiences occasional 1‑2 second Young GC pauses after warm‑up, caused by dynamic promotion thresholds that trigger massive object promotion, and the article explains the root cause, demonstrates log analysis, and provides JVM tuning steps to eliminate the long pauses.
1. Problem Description
The company's rule‑engine system performs a manual warm‑up before each release; after traffic switches in, it occasionally suffers a 1‑2 second Young GC pause on every node, while subsequent Young GCs stay within 20‑100 ms. Such pauses are intolerable because rule execution only takes a few milliseconds, and timeouts can cause order failures.
2. Problem Analysis
GC logs show that the long pause occurs during the Young GC phase and each long pause is accompanied by a promotion of objects from the young generation to the old generation.
Core JVM parameters (Oracle JDK 7)
<code>-Xms10G</code>
<code>-Xmx10G</code>
<code>-XX:NewSize=4G</code>
<code>-XX:PermSize=1g</code>
<code>-XX:MaxPermSize=4g</code>
<code>-XX:+</code>First Young GC Log (after startup)
<code>2023-04-23T16:28:31.108+0800: [GC ... : 0.1444710 secs] 3544342K->374555K(3774912K), 0.1446290 secs] [Times: user=1.46 sys=0.09, real=0.15 secs]</code>Long‑pause Young GC Log
<code>2023-04-23T17:18:28.514+0800: [GC ... : 1.5114000 secs] 3730075K->676858K(10066368K), 1.5114870 secs] [Times: user=6.32 sys=0.58, real=1.51 secs]</code>The long pause shows a promotion of over 363 MB of objects to the old generation, which accounts for the extra time.
3. Young Generation Promotion Mechanism
To better adapt to different memory situations, the JVM does not require an object’s age to reach MaxTenuringThreshold before promotion; if the total size of objects of the same age in Survivor exceeds half of Survivor, objects of that age or older are promoted directly.
The book *Deep Understanding of the Java Virtual Machine* mentions that the promotion age threshold is dynamically determined, but the observed behavior differs slightly.
The JVM groups objects by age, calculates the cumulative size (
total) for each age, and if the largest
totalexceeds half of Survivor, the promotion threshold is updated to that age. The update takes effect on the next GC, not the current one.
Example from the first GC log:
<code>Desired survivor size 214728704 bytes, new threshold 1 (max 15)</code>
<code>- age 1: 315529928 bytes, 315529928 total</code>
<code>- age 2: 40956656 bytes, 356486584 total</code>
<code>- age 3: 8408040 bytes, 364894624 total</code>Since the total for age 1 exceeds the desired survivor size, the threshold becomes 1.
4. Solution
The dynamic age determination cannot be disabled, so the workaround is to enlarge Survivor so that the cumulative size of temporary reachable objects never exceeds half of Survivor.
In the example, age 1 objects total about 315 MB, while the desired survivor size (half of Survivor) is ~204 MB. Setting Survivor to >600 MB (e.g., 800 MB) makes the desired survivor size ~400 MB, preventing the threshold from dropping to 1.
Survivor size is adjusted via
-XX:SurvivorRatio, which defines the ratio between Eden and each Survivor space.
<code>-XX:SurvivorRatio=8</code>This means Eden : S0 : S1 = 8 : 1 : 1, so each Survivor occupies 1/10 of the young generation.
Using
-XX:SurvivorRatio=3makes Survivor 40% of the young generation (≈1.7 GB), comfortably larger than the 315 MB of age 1 objects, so the promotion threshold stays high and the long pause disappears.
<code>-XX:SurvivorRatio=3</code>After applying this change, Young GC pauses stabilized at 30‑100 ms with no long pauses.
5. Extension
Why is promoting 300 MB slower than reclaiming 3 GB?
Copying algorithms spend time proportional to the size of live objects being copied, not the total heap size. The 363 MB promotion dominates the 1.5 s pause.
Why is promotion costly?
Promotion involves cross‑generation copying and additional bookkeeping (e.g., pointer updates), which adds overhead compared to intra‑generation copying.
6. Local Code Simulation
The following Java program reproduces the issue on JDK 7:
<code>//jdk7 example
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class PromotionTest {
public static void main(String[] args) throws IOException {
// simulate initialization
List<Object> dataList = new ArrayList<>();
for (int i = 0; i < 5; i++) {
dataList.add(new InnerObject());
}
// simulate traffic
for (int i = 0; i < 73; i++) {
if(i == 72){
System.out.println("Execute young gc...Adjust promotion threshold to 1");
}
new InnerObject();
}
System.out.println("Execute full gc...dataList has been promoted to cms old space");
System.gc();
}
public static byte[] createData(){
int dataSize = 1024*1024*4; // 4 MB
byte[] data = new byte[dataSize];
for (int j = 0; j < dataSize; j++) {
data[j] = 1;
}
return data;
}
static class InnerObject{
private Object data;
public InnerObject() {
this.data = createData();
}
}
}
</code>JVM options used for testing:
<code>-server -Xmn400M -XX:SurvivorRatio=9 -Xms1000M -Xmx1000M -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+PrintReferenceGC -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC</code>References
[1] *Deep Understanding of the Java Virtual Machine* – Zhou Zhiming
[2] https://blog.codecentric.de/en/2012/08/useful-jvm-flags-part-5-young-generation-garbage-collection/
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.