Boost Cloud App Performance by 36% with Baidu’s Btune Diagnostic Tool
This article explains how Baidu Cloud’s Btune performance‑diagnostic tool helps identify CPU, memory and NUMA bottlenecks, provides automatic optimization suggestions, and demonstrates a real‑world test that improves a memory‑intensive program’s runtime by up to 36.8% after applying the recommended changes.
When migrating services to a newer CPU platform, developers often encounter unexpected performance regressions such as increased latency or reduced throughput, which can be puzzling for both developers and operations engineers.
To quickly locate bottlenecks and guide optimization, Baidu Intelligent Cloud offers the Application Performance Diagnostic Tool – Btune . Btune works like a one‑click performance tuner for cloud workloads, automatically analyzing CPU, memory, disk, network, and concurrency dimensions and generating actionable recommendations.
Key Features of Btune
Multi‑dimensional analysis (CPU, memory, disk, network, concurrency).
Automatic generation of optimization suggestions based on Baidu’s extensive tuning experience across Intel, AMD, and ARM CPUs.
Visualized data dashboards for both summary and detailed reports.
Demo Test Case
A simple test program repeatedly calls memset and memcpy on large arrays and is executed under numactl to simulate cross‑NUMA memory access. The source code is:
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#define ARRAY_SIZE 1000000000
void main()
{
int i = 0;
int *a = malloc(sizeof(int) * ARRAY_SIZE);
int *b = malloc(sizeof(int) * ARRAY_SIZE);
while(1)
{
memset(a, 0, sizeof(int) * ARRAY_SIZE);
memset(b, 0, sizeof(int) * ARRAY_SIZE);
memcpy(b, a, sizeof(int) * ARRAY_SIZE);
};
}Btune analyzes the program and returns two categories of suggestions:
Memory‑operation hotspots: upgrade the glibc library (e.g., from 2.17 to 2.33) to improve memset and memcpy performance.
NUMA‑related advice: reduce cross‑NUMA memory accesses.
Using Btune
Log in to the Baidu Cloud console and create a cloud server instance.
Upload and start the test program on the instance.
Open the “Self‑service Diagnosis” tool, select “Performance Detection”, choose the instance and the test process, and start data collection.
After a few minutes, view the Analysis Summary Report , which lists bottleneck items and concrete recommendations.
Click “View Detailed Report” to explore deeper metrics for CPU, memory, network, disk, and concurrency.
The summary report highlights three optimization items: (1) hotspot percentages for memset and memcpy with a recommendation to upgrade glibc; (2) 100% cross‑NUMA memory usage with a suggestion to limit NUMA crossing; (3) overall resource distribution.
Optimization Results
Baseline execution time of the program: 2.576 seconds .
After applying the first recommendation (disable cross‑NUMA while keeping glibc 2.17): 1.821 seconds (29.3% improvement).
After applying both recommendations (upgrade glibc to 2.33 and disable cross‑NUMA): 1.626 seconds (36.8% improvement).
These results demonstrate that Btune can turn a seemingly opaque performance issue into clear, measurable gains.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baidu Intelligent Cloud Tech Hub
We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
