Backend Development 12 min read

Using async-profiler to Optimize CPU Usage in a Dynamic QPS Test Case

The article details how the author used async-profiler to analyze a Java dynamic QPS test case, identified a CPU hotspot in a time‑checking method, replaced it with a timestamp check, and achieved a modest 0.1% reduction in overall CPU usage, illustrated with flame‑graph images and code snippets.

FunTester

Dec 14, 2022

Using async-profiler to Optimize CPU Usage in a Dynamic QPS Test Case

I have long heard about the powerful capabilities of CPU flame graphs and various flame‑graph tools, and today I finally started trying a CPU flame‑graph generation tool.

Unfortunately, the flame‑graph plugin bundled with IntelliJ could not be used for various reasons, so I turned to the async‑profiler analysis tool as a replacement.

While testing random‑number performance, I used a dynamic QPS model case, learned how to use async‑profiler, and unexpectedly discovered an area for performance optimization that reduced CPU usage by 0.1%, marking my first result.

async-profiler

The installation and usage guide for this tool can be found online; I recommend checking the GitHub repository Wiki for details.

Case code

Below is the case code that uses a dynamic QPS model.

<code style="padding: 16px; color: #333; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="font-weight: bold; line-height: 26px">class</span> <span style="color: #458; font-weight: bold; line-height: 26px">T</span> <span style="font-weight: bold; line-height: 26px">extends</span> <span style="color: #458; font-weight: bold; line-height: 26px">SourceCode</span> {</span><br/><br/>    <span style="font-weight: bold; line-height: 26px">static</span> <span style="font-weight: bold; line-height: 26px">void</span> main(String[] args) {<br/>        <span style="font-weight: bold; line-height: 26px">def</span> total = <span style="color: #008080; line-height: 26px">1000</span>_0000<br/>        <span style="font-weight: bold; line-height: 26px">def</span> index = <span style="font-weight: bold; line-height: 26px">new</span> AtomicInteger()<br/>        <span style="font-weight: bold; line-height: 26px">int</span> i = <span style="color: #008080; line-height: 26px">0</span><br/>        <span style="font-weight: bold; line-height: 26px">def</span> test = {<br/>            i++ % total<br/>            <span style="color: #998; font-style: italic; line-height: 26px">//            index.getAndIncrement() % total</span><br/>            getRandomInt(total)<br/>            sleep(<span style="color: #008080; line-height: 26px">0.01</span>)<br/>        }<br/>        <span style="font-weight: bold; line-height: 26px">new</span> FunQpsConcurrent(test, <span style="color: #d14; line-height: 26px">"测试随机性能"</span>).start()<br/>    }<br/>}<br/><br/></code>

The method that executes the task is

com.okcoin.hickwall.presses.funtester.frame.execute.FunQpsConcurrent#start

and its code is shown below:

<code style="padding: 16px; color: #333; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px">    <span style="line-height: 26px"><span style="font-weight: bold; line-height: 26px">void</span> <span style="color: #900; font-weight: bold; line-height: 26px">start</span><span style="line-height: 26px">()</span> </span>{<br/>        <span style="font-weight: bold; line-height: 26px">if</span> (executor == <span style="font-weight: bold; line-height: 26px">null</span>) executor = ThreadPoolUtil.createCachePool(Constant.THREADPOOL_MAX, <span style="color: #d14; line-height: 26px">"Q"</span>)<br/>        <span style="font-weight: bold; line-height: 26px">if</span> (Common.PERF_PLATFORM) controller = <span style="font-weight: bold; line-height: 26px">new</span> RedisController(<span style="font-weight: bold; line-height: 26px">this</span>)<br/>        <span style="font-weight: bold; line-height: 26px">if</span> (controller == <span style="font-weight: bold; line-height: 26px">null</span>) controller = <span style="font-weight: bold; line-height: 26px">new</span> FunTester();<br/>        <span style="font-weight: bold; line-height: 26px">new</span> Thread(controller, <span style="color: #d14; line-height: 26px">"receiver"</span>).start();<br/>        <span style="font-weight: bold; line-height: 26px">while</span> (key) {<br/>            ThreadPoolUtil.executeTask(executor, qps, produce, total, name)<br/>        }<br/>        stop()<br/>    }<br/><br/></code>

Optimization Process

The entire main thread spends most of its time in the while loop. I first generated a flame graph of the main thread, shown below:

From the pre‑optimization flame graph, the method

com.okcoin.hickwall.presses.funtester.frame.execute.ThreadPoolUtil#executeTask

consumes 0.53% CPU, while the getSecond method uses the most CPU because it creates a Calendar object. The relevant code is:

<code style="padding: 16px; color: #333; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px">        <span style="font-weight: bold; line-height: 26px">if</span> (Time.getSecond() % COUNT_INTERVAL == <span style="color: #008080; line-height: 26px">0</span>) {<br/>            <span style="font-weight: bold; line-height: 26px">int</span> real = total.sumThenReset() / COUNT_INTERVAL as <span style="font-weight: bold; line-height: 26px">int</span><br/>            def active = executor.getActiveCount()<br/>            def count = active == <span style="color: #008080; line-height: 26px">0</span> ? <span style="color: #008080; line-height: 26px">1</span> : active<br/>            log.info(<span style="color: #d14; line-height: 26px">"{} design QPS:{},actual QPS:{} active thread:{} per thread efficiency:{}"</span>, name, qps, real, active, real / count as <span style="font-weight: bold; line-height: 26px">int</span>)<br/>        }<br/></code>

The original intention was to output design QPS, actual QPS, and active thread count every few seconds. I suspected that using a timestamp check would be faster, so I replaced the code as follows:

<code style="padding: 16px; color: #333; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px">        <span style="font-weight: bold; line-height: 26px">if</span> (SourceCode.getMark() % COUNT_INTERVAL == <span style="color: #008080; line-height: 26px">0</span>) {<br/>            <span style="font-weight: bold; line-height: 26px">int</span> real = total.sumThenReset() / COUNT_INTERVAL as <span style="font-weight: bold; line-height: 26px">int</span><br/>            def active = executor.getActiveCount()<br/>            def count = active == <span style="color: #008080; line-height: 26px">0</span> ? <span style="color: #008080; line-height: 26px">1</span> : active<br/>            log.info(<span style="color: #d14; line-height: 26px">"{} design QPS:{},actual QPS:{} active thread:{} per thread efficiency:{}"</span>, name, qps, real, active, real / count as <span style="font-weight: bold; line-height: 26px">int</span>)<br/>        }<br/></code>

After rebuilding and running the test, I captured another flame graph for the main thread, shown below:

Post‑optimization, the CPU usage of

com.okcoin.hickwall.presses.funtester.frame.execute.ThreadPoolUtil#executeTask

dropped to 0.29%, while the

com.okcoin.hickwall.presses.funtester.frame.execute.FunQpsConcurrent#start

method now consumes 0.44% CPU, a reduction of 0.09% compared with the original 0.53%.

Rounded, the overall improvement is about 0.1%, which I consider a successful optimization. I also noticed that most of the remaining CPU time is spent in the sleep method, suggesting that the earlier conclusions about random‑number performance may need revisiting.

FunTester原创专题推荐~ 接口功能测试专题性能测试专题 Groovy专题 Java、Groovy、Go、Python 单测&白盒 FunTester社群风采测试理论鸡汤 FunTester视频专题案例分享：方案、BUG、爬虫 UI自动化专题测试工具专题 -- By FunTester

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

optimization flame graph Java performance CPU profiling async-profiler

Written by

FunTester

10k followers, 1k articles | completely useless

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.