Handling Data Surge in a Data Push Platform: JVM Tuning, Flow Control, and Performance Optimization
This article analyzes the challenges of data‑burst scenarios in a data‑push platform, evaluates traditional throttling methods, presents JVM‑level tuning and a custom heap‑usage based flow‑control mechanism, and validates the solution through extensive pressure testing, demonstrating significant reductions in full GC frequency and overall push latency.
Introduction
In everyday work, data‑burst scenarios such as e‑commerce flash sales cause a massive influx of requests within a short time, overwhelming system resources and risking crashes if not pre‑emptively mitigated.
Business Background
Our application generates large volumes of business data (e.g., products, orders) that external platforms need to subscribe to. A data‑push platform consumes messages from an MQ, processes them, and asynchronously pushes the data to various partners.
Technical Background
Solution Selection
Typical high‑concurrency solutions include cache , circuit‑breaker (degrade) , and rate‑limiting . Cache is unsuitable for our data‑push scenario; circuit‑breaker can handle downstream partner failures but does not address the root cause of limited system resources. Therefore, we chose rate‑limiting as the primary strategy.
Solution Application
Common rate‑limiting algorithms are token‑bucket, leaky‑bucket, and counter‑based. We selected the counter‑based approach, implemented with AtomicInteger , Semaphore , or thread‑pool; the counter method was chosen for this case.
When using synchronous pushes, the number of MQ consumer threads (default 20) limits throughput. Asynchronous pushes use Apache HttpAsyncClient (based on the Reactor model) with callback handling, and we limit flow per business data type based on the current number of in‑flight pushes.
//该业务类型在当前节点的流量
Integer flowCount = BizFlowLimitUtil.get(data.getBizType());
//该种业务类型对应的限流
Integer overload = BizFlowLimitUtil.getOverloadOrDefault(data.getBizType(), this.defaultLimit);
if (flowCount >= overload) {
throw new OverloadException("业务类型:" + data.getBizType() + "负载过高,阈值:" + overload + ",当前负载值:" + flowCount);
}Pressure Testing
Resource Configuration
Instance count: 1, CPU 1 core, Memory 2 GB, JVM parameters:
-Xmx1g -Xms1g -Xmn512m -XX:SurvivorRatio=10 -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSMaxAbortablePrecleanTime=5000 -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:+ExplicitGCInvokesConcurrent -XX:ParallelGCThreads=2 -Xloggc:/opt/modules/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/modules/java.hproData Metrics and Tool Selection
During tests we monitor CPU, memory, network I/O, and database metrics using Arthas (Alibaba’s Java diagnostic tool) and Grafana dashboards.
Test Scenarios
We simulate backlog of 5 000, 10 000, and 20 000 messages in MQ and observe JVM heap usage, GC activity, and thread CPU consumption at 1 s and 5 s intervals.
Key observations include rapid heap growth, frequent young GC, and eventual full GC triggered by the -XX:CMSInitiatingOccupancyFraction=80 setting, causing “Stop‑the‑World” pauses, high CPU usage, and socket timeouts.
Optimization Plan
Problem Analysis
The downstream push rate cannot keep up with upstream MQ consumption, leading to heap saturation, frequent full GC, and eventual OOM.
Optimization Ideas
We address the bottleneck by adjusting JVM parameters and adding a JVM‑resource‑based flow‑control component.
JVM Parameter Optimization
Increase heap to 1.5 GB ( -Xmx1536M -Xms1536M ) and enlarge the young generation to 1 GB ( -Xmn1024M ). Adjust survivor ratio to 8 ( -XX:SurvivorRatio=8 ) and set metaspace to 256 MB. Use -XX:+UseParNewGC for the young generation.
-Xmx1536M -Xms1536M -Xmn1024M -Xss1M -XX:MaxMetaspaceSize=256M -XX:MetaspaceSize=256M -XX:SurvivorRatio=8 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSMaxAbortablePrecleanTime=5000 -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:+ExplicitGCInvokesConcurrent -Xloggc:/opt/modules/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/modules/java.hprofJVM Resource Flow‑Control
We implement a limiter that monitors heap usage percentage; when it exceeds a configurable threshold (e.g., 70 %), the processing thread sleeps or triggers a manual full GC, releasing resources before resuming.
public class ResourceLimitHandler {
private Integer threshold = 70;
private Integer sleepTime = 1000;
private Integer maxBlockTime = 15000;
private MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean();
@SneakyThrows
public void process() {
long startTime = System.currentTimeMillis();
double percent = this.getHeapUsedPercent();
while (percent >= this.threshold) {
if (this.maxBlockTime >= 0 && (System.currentTimeMillis() - startTime) > this.maxBlockTime) {
synchronized (ResourceLimitHandler.class) {
if ((percent = this.getHeapUsedPercent()) >= this.threshold) {
System.gc();
}
}
return;
}
TimeUnit.MILLISECONDS.sleep(this.sleepTime);
percent = this.getHeapUsedPercent();
}
}
private double getHeapUsedPercent() {
long max = this.getHeapMax();
long used = this.getHeapUsed();
return NumberUtil.div(used, max) * 100;
}
private long getHeapMax() {
MemoryUsage memoryUsage = this.memoryMXBean.getHeapMemoryUsage();
return memoryUsage.getMax();
}
private long getHeapUsed() {
MemoryUsage memoryUsage = this.memoryMXBean.getHeapMemoryUsage();
return memoryUsage.getUsed();
}
}Validation
We redeploy two instances (2 CPU, 4 GB each) with the optimized JVM settings and the new limiter, then push 50 000 messages from MQ.
Results:
Scenario
Total Push Time
Full GC Count
Total Full GC Time (ms)
Avg Full GC Time (ms)
Before Optimization
≈35 min
312
309 232
991
After Optimization
≈18 min
104
45 387
436
The optimized configuration halves the push duration and reduces full GC frequency and latency, confirming the effectiveness of JVM tuning combined with heap‑usage based flow control.
Conclusion
For data‑push services facing burst traffic, traditional concurrency‑based throttling alone may be insufficient. By enlarging the young generation, adjusting survivor ratios, and introducing a heap‑usage limiter, we can prevent excessive full GC, improve stability, and significantly boost throughput. The approach is most suitable for services using the CMS collector; adaptations are needed for G1 or other collectors.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.