Using async-profiler to Optimize CPU Usage in a Dynamic QPS Test Case
The article details how the author used async-profiler to analyze a Java dynamic QPS test case, identified a CPU hotspot in a time‑checking method, replaced it with a timestamp check, and achieved a modest 0.1% reduction in overall CPU usage, illustrated with flame‑graph images and code snippets.
I have long heard about the powerful capabilities of CPU flame graphs and various flame‑graph tools, and today I finally started trying a CPU flame‑graph generation tool.
Unfortunately, the flame‑graph plugin bundled with IntelliJ could not be used for various reasons, so I turned to the async‑profiler analysis tool as a replacement.
While testing random‑number performance, I used a dynamic QPS model case, learned how to use async‑profiler, and unexpectedly discovered an area for performance optimization that reduced CPU usage by 0.1%, marking my first result.
async-profiler
The installation and usage guide for this tool can be found online; I recommend checking the GitHub repository Wiki for details.
Case code
Below is the case code that uses a dynamic QPS model.
<code style="padding: 16px; color: #333; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="line-height: 26px"><span style="font-weight: bold; line-height: 26px">class</span> <span style="color: #458; font-weight: bold; line-height: 26px">T</span> <span style="font-weight: bold; line-height: 26px">extends</span> <span style="color: #458; font-weight: bold; line-height: 26px">SourceCode</span> {</span><br/><br/> <span style="font-weight: bold; line-height: 26px">static</span> <span style="font-weight: bold; line-height: 26px">void</span> main(String[] args) {<br/> <span style="font-weight: bold; line-height: 26px">def</span> total = <span style="color: #008080; line-height: 26px">1000</span>_0000<br/> <span style="font-weight: bold; line-height: 26px">def</span> index = <span style="font-weight: bold; line-height: 26px">new</span> AtomicInteger()<br/> <span style="font-weight: bold; line-height: 26px">int</span> i = <span style="color: #008080; line-height: 26px">0</span><br/> <span style="font-weight: bold; line-height: 26px">def</span> test = {<br/> i++ % total<br/> <span style="color: #998; font-style: italic; line-height: 26px">// index.getAndIncrement() % total</span><br/> getRandomInt(total)<br/> sleep(<span style="color: #008080; line-height: 26px">0.01</span>)<br/> }<br/> <span style="font-weight: bold; line-height: 26px">new</span> FunQpsConcurrent(test, <span style="color: #d14; line-height: 26px">"测试随机性能"</span>).start()<br/> }<br/>}<br/><br/></code>The method that executes the task is
com.okcoin.hickwall.presses.funtester.frame.execute.FunQpsConcurrent#startand its code is shown below:
<code style="padding: 16px; color: #333; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"> <span style="line-height: 26px"><span style="font-weight: bold; line-height: 26px">void</span> <span style="color: #900; font-weight: bold; line-height: 26px">start</span><span style="line-height: 26px">()</span> </span>{<br/> <span style="font-weight: bold; line-height: 26px">if</span> (executor == <span style="font-weight: bold; line-height: 26px">null</span>) executor = ThreadPoolUtil.createCachePool(Constant.THREADPOOL_MAX, <span style="color: #d14; line-height: 26px">"Q"</span>)<br/> <span style="font-weight: bold; line-height: 26px">if</span> (Common.PERF_PLATFORM) controller = <span style="font-weight: bold; line-height: 26px">new</span> RedisController(<span style="font-weight: bold; line-height: 26px">this</span>)<br/> <span style="font-weight: bold; line-height: 26px">if</span> (controller == <span style="font-weight: bold; line-height: 26px">null</span>) controller = <span style="font-weight: bold; line-height: 26px">new</span> FunTester();<br/> <span style="font-weight: bold; line-height: 26px">new</span> Thread(controller, <span style="color: #d14; line-height: 26px">"receiver"</span>).start();<br/> <span style="font-weight: bold; line-height: 26px">while</span> (key) {<br/> ThreadPoolUtil.executeTask(executor, qps, produce, total, name)<br/> }<br/> stop()<br/> }<br/><br/></code>Optimization Process
The entire main thread spends most of its time in the while loop. I first generated a flame graph of the main thread, shown below:
From the pre‑optimization flame graph, the method
com.okcoin.hickwall.presses.funtester.frame.execute.ThreadPoolUtil#executeTaskconsumes 0.53% CPU, while the getSecond method uses the most CPU because it creates a Calendar object. The relevant code is:
<code style="padding: 16px; color: #333; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"> <span style="font-weight: bold; line-height: 26px">if</span> (Time.getSecond() % COUNT_INTERVAL == <span style="color: #008080; line-height: 26px">0</span>) {<br/> <span style="font-weight: bold; line-height: 26px">int</span> real = total.sumThenReset() / COUNT_INTERVAL as <span style="font-weight: bold; line-height: 26px">int</span><br/> def active = executor.getActiveCount()<br/> def count = active == <span style="color: #008080; line-height: 26px">0</span> ? <span style="color: #008080; line-height: 26px">1</span> : active<br/> log.info(<span style="color: #d14; line-height: 26px">"{} design QPS:{},actual QPS:{} active thread:{} per thread efficiency:{}"</span>, name, qps, real, active, real / count as <span style="font-weight: bold; line-height: 26px">int</span>)<br/> }<br/></code>The original intention was to output design QPS, actual QPS, and active thread count every few seconds. I suspected that using a timestamp check would be faster, so I replaced the code as follows:
<code style="padding: 16px; color: #333; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"> <span style="font-weight: bold; line-height: 26px">if</span> (SourceCode.getMark() % COUNT_INTERVAL == <span style="color: #008080; line-height: 26px">0</span>) {<br/> <span style="font-weight: bold; line-height: 26px">int</span> real = total.sumThenReset() / COUNT_INTERVAL as <span style="font-weight: bold; line-height: 26px">int</span><br/> def active = executor.getActiveCount()<br/> def count = active == <span style="color: #008080; line-height: 26px">0</span> ? <span style="color: #008080; line-height: 26px">1</span> : active<br/> log.info(<span style="color: #d14; line-height: 26px">"{} design QPS:{},actual QPS:{} active thread:{} per thread efficiency:{}"</span>, name, qps, real, active, real / count as <span style="font-weight: bold; line-height: 26px">int</span>)<br/> }<br/></code>After rebuilding and running the test, I captured another flame graph for the main thread, shown below:
Post‑optimization, the CPU usage of
com.okcoin.hickwall.presses.funtester.frame.execute.ThreadPoolUtil#executeTaskdropped to 0.29%, while the
com.okcoin.hickwall.presses.funtester.frame.execute.FunQpsConcurrent#startmethod now consumes 0.44% CPU, a reduction of 0.09% compared with the original 0.53%.
Rounded, the overall improvement is about 0.1%, which I consider a successful optimization. I also noticed that most of the remaining CPU time is spent in the sleep method, suggesting that the earlier conclusions about random‑number performance may need revisiting.
FunTester原创专题推荐~ 接口功能测试专题 性能测试专题 Groovy专题 Java、Groovy、Go、Python 单测&白盒 FunTester社群风采 测试理论鸡汤 FunTester视频专题 案例分享:方案、BUG、爬虫 UI自动化专题 测试工具专题 -- By FunTester
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
