Boost Cloud Application Speed by 36% Using Baidu’s Btune Performance Diagnostic Tool
After migrating workloads to a new CPU platform, unexpected performance regressions can occur, but Baidu Cloud's Btune tool provides automated, multi‑dimensional analysis and actionable optimization suggestions that helped a test program improve its execution time by 36.8% through memory and NUMA tuning.
Background
Developers often encounter surprising performance drops after moving services to newer CPU platforms or launching new workloads, with latency sometimes doubling despite the hardware upgrade.
Btune Overview
Btune, Baidu Intelligent Cloud's Application Performance Diagnostic Tool, offers one‑click performance tuning for cloud workloads. It leverages Baidu’s extensive experience across Intel, AMD, and ARM CPUs and various business scenarios (recommendation, search, advertising, big data, databases, video transcoding) to automatically identify bottlenecks and generate optimization recommendations.
Analysis Dimensions
CPU, memory, disk, network, and concurrency are examined.
Analysis spans application, runtime, system, and hardware layers.
Test Case Description
A simple C program repeatedly calls memset and memcpy on large arrays, then runs under numactl to simulate cross‑NUMA memory access. The source code is:
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#define ARRAY_SIZE 1000000000
int main() {
int i = 0;
int *a = malloc(sizeof(int) * ARRAY_SIZE);
int *b = malloc(sizeof(int) * ARRAY_SIZE);
while (1) {
memset(a, 0, sizeof(int) * ARRAY_SIZE);
memset(b, 0, sizeof(int) * ARRAY_SIZE);
memcpy(b, a, sizeof(int) * ARRAY_SIZE);
}
return 0;
}Step‑by‑Step Usage of Btune
Log in to the Baidu Cloud console and create a cloud server instance.
Upload and start the test program on the instance.
Open the “Self‑service Diagnosis” tool, select “Performance Detection”, choose the server and the test process, and start data collection.
After a few minutes, view the analysis summary report, which lists bottlenecks and optimization suggestions.
Open the detailed report to explore CPU, memory, network, disk, and concurrency metrics, as well as hotspot functions and flame graphs.
Analysis Findings
The summary report identified three key suggestions:
Upgrade the glibc library (hotspot functions memset and memcpy benefit from glibc 2.33).
Reduce cross‑NUMA memory accesses (current usage is 100%).
Detailed diagnostics showed:
CPU: No issues with kernel networking, storage, or scheduling; primary risk lies in glibc hotspot functions.
Memory: No leaks, uses anonymous huge pages, but high cross‑NUMA usage.
Concurrency: Single thread, no split‑lock or context‑switch problems.
Optimization Results
Applying the first suggestion (disable cross‑NUMA, keep glibc 2.17) reduced execution time from 2.576 s to 1.821 s (29.3% improvement). Applying both suggestions (upgrade to glibc 2.33 and disable cross‑NUMA) further reduced time to 1.626 s, a total gain of 36.8%.
Conclusion
Btune enables even junior operations engineers to perform high‑level performance tuning by automatically locating bottlenecks across multiple dimensions and providing concrete, actionable recommendations, delivering significant speedups for cloud applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
