Backend Development 22 min read

Handling Data Surge in a Data Push Platform: JVM Tuning, Flow Control, and Performance Optimization

This article analyzes the challenges of data‑burst scenarios in a data‑push platform, evaluates traditional throttling methods, presents JVM‑level tuning and a custom heap‑usage based flow‑control mechanism, and validates the solution through extensive pressure testing, demonstrating significant reductions in full GC frequency and overall push latency.

政采云技术
政采云技术
政采云技术
Handling Data Surge in a Data Push Platform: JVM Tuning, Flow Control, and Performance Optimization

Introduction

In everyday work, data‑burst scenarios such as e‑commerce flash sales cause a massive influx of requests within a short time, overwhelming system resources and risking crashes if not pre‑emptively mitigated.

Business Background

Our application generates large volumes of business data (e.g., products, orders) that external platforms need to subscribe to. A data‑push platform consumes messages from an MQ, processes them, and asynchronously pushes the data to various partners.

Technical Background

Solution Selection

Typical high‑concurrency solutions include cache , circuit‑breaker (degrade) , and rate‑limiting . Cache is unsuitable for our data‑push scenario; circuit‑breaker can handle downstream partner failures but does not address the root cause of limited system resources. Therefore, we chose rate‑limiting as the primary strategy.

Solution Application

Common rate‑limiting algorithms are token‑bucket, leaky‑bucket, and counter‑based. We selected the counter‑based approach, implemented with AtomicInteger , Semaphore , or thread‑pool; the counter method was chosen for this case.

When using synchronous pushes, the number of MQ consumer threads (default 20) limits throughput. Asynchronous pushes use Apache HttpAsyncClient (based on the Reactor model) with callback handling, and we limit flow per business data type based on the current number of in‑flight pushes.

//该业务类型在当前节点的流量
Integer flowCount = BizFlowLimitUtil.get(data.getBizType());
//该种业务类型对应的限流
Integer overload = BizFlowLimitUtil.getOverloadOrDefault(data.getBizType(), this.defaultLimit);

if (flowCount >= overload) {
    throw new OverloadException("业务类型:" + data.getBizType() + "负载过高,阈值:" + overload + ",当前负载值:" + flowCount);
}

Pressure Testing

Resource Configuration

Instance count: 1, CPU 1 core, Memory 2 GB, JVM parameters:

-Xmx1g -Xms1g -Xmn512m -XX:SurvivorRatio=10 -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSMaxAbortablePrecleanTime=5000 -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:+ExplicitGCInvokesConcurrent -XX:ParallelGCThreads=2 -Xloggc:/opt/modules/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/modules/java.hpro

Data Metrics and Tool Selection

During tests we monitor CPU, memory, network I/O, and database metrics using Arthas (Alibaba’s Java diagnostic tool) and Grafana dashboards.

Test Scenarios

We simulate backlog of 5 000, 10 000, and 20 000 messages in MQ and observe JVM heap usage, GC activity, and thread CPU consumption at 1 s and 5 s intervals.

Key observations include rapid heap growth, frequent young GC, and eventual full GC triggered by the -XX:CMSInitiatingOccupancyFraction=80 setting, causing “Stop‑the‑World” pauses, high CPU usage, and socket timeouts.

Optimization Plan

Problem Analysis

The downstream push rate cannot keep up with upstream MQ consumption, leading to heap saturation, frequent full GC, and eventual OOM.

Optimization Ideas

We address the bottleneck by adjusting JVM parameters and adding a JVM‑resource‑based flow‑control component.

JVM Parameter Optimization

Increase heap to 1.5 GB ( -Xmx1536M -Xms1536M ) and enlarge the young generation to 1 GB ( -Xmn1024M ). Adjust survivor ratio to 8 ( -XX:SurvivorRatio=8 ) and set metaspace to 256 MB. Use -XX:+UseParNewGC for the young generation.

-Xmx1536M -Xms1536M -Xmn1024M -Xss1M -XX:MaxMetaspaceSize=256M -XX:MetaspaceSize=256M -XX:SurvivorRatio=8 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSMaxAbortablePrecleanTime=5000 -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:+ExplicitGCInvokesConcurrent -Xloggc:/opt/modules/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/modules/java.hprof

JVM Resource Flow‑Control

We implement a limiter that monitors heap usage percentage; when it exceeds a configurable threshold (e.g., 70 %), the processing thread sleeps or triggers a manual full GC, releasing resources before resuming.

public class ResourceLimitHandler {
    private Integer threshold = 70;
    private Integer sleepTime = 1000;
    private Integer maxBlockTime = 15000;
    private MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean();

    @SneakyThrows
    public void process() {
        long startTime = System.currentTimeMillis();
        double percent = this.getHeapUsedPercent();
        while (percent >= this.threshold) {
            if (this.maxBlockTime >= 0 && (System.currentTimeMillis() - startTime) > this.maxBlockTime) {
                synchronized (ResourceLimitHandler.class) {
                    if ((percent = this.getHeapUsedPercent()) >= this.threshold) {
                        System.gc();
                    }
                }
                return;
            }
            TimeUnit.MILLISECONDS.sleep(this.sleepTime);
            percent = this.getHeapUsedPercent();
        }
    }

    private double getHeapUsedPercent() {
        long max = this.getHeapMax();
        long used = this.getHeapUsed();
        return NumberUtil.div(used, max) * 100;
    }

    private long getHeapMax() {
        MemoryUsage memoryUsage = this.memoryMXBean.getHeapMemoryUsage();
        return memoryUsage.getMax();
    }

    private long getHeapUsed() {
        MemoryUsage memoryUsage = this.memoryMXBean.getHeapMemoryUsage();
        return memoryUsage.getUsed();
    }
}

Validation

We redeploy two instances (2 CPU, 4 GB each) with the optimized JVM settings and the new limiter, then push 50 000 messages from MQ.

Results:

Scenario

Total Push Time

Full GC Count

Total Full GC Time (ms)

Avg Full GC Time (ms)

Before Optimization

≈35 min

312

309 232

991

After Optimization

≈18 min

104

45 387

436

The optimized configuration halves the push duration and reduces full GC frequency and latency, confirming the effectiveness of JVM tuning combined with heap‑usage based flow control.

Conclusion

For data‑push services facing burst traffic, traditional concurrency‑based throttling alone may be insufficient. By enlarging the young generation, adjusting survivor ratios, and introducing a heap‑usage limiter, we can prevent excessive full GC, improve stability, and significantly boost throughput. The approach is most suitable for services using the CMS collector; adaptations are needed for G1 or other collectors.

backendJVMperformance testingGCflow controlResource Limiting
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.