Why Does CPU Spike During Load Tests? Uncovering Hidden DB Row‑Read Bottlenecks
During a load‑test where QPS and TPS appear normal, the CPU usage climbs unexpectedly, leading to a step‑by‑step investigation that reveals excessive row reads caused by accumulating task records, and presents practical fixes and deeper insights into database read behavior.
Business Background
The product added a stock limit to cash‑exchange channels, creating a flash‑sale scenario where the core API QPS surged nearly 600× during peak, necessitating a load test to gauge system and database limits.
Load Test Preparation
Evaluate test traffic: estimate QPS per interface and downstream traffic.
Test adaptation: mock real‑user flows because test accounts leave no historical traces, ensuring traffic distribution matches production.
Test & release: verify code changes with QA before running the load test; use HSF console to simulate shadow traffic.
Downstream traffic notification: inform downstream services before testing.
Prepare test data: configure interface traffic, subtract background traffic, and request test accounts.
Small‑traffic dry run: run with ~1% traffic to catch early issues.
Problem Occurrence
During the test, CPU utilization rose far above expectations: 10% traffic → 11% CPU (normal), 30% → 20% (steady), 50% → 30% (expected), 80% → 50% (concerning), 100% → 80% then suddenly 100%, forcing the test to stop.
Problem Investigation
Initial suspicion fell on the test‑specific code causing high QPS/TPS, but DB‑service metrics showed QPS stayed stable while CPU spiked, ruling out that hypothesis.
DBA analysis revealed long‑running high‑CPU periods were due to performance degradation under sustained load, and suggested checking additional DB metrics.
Discovery of Row‑Read Anomaly
Examining DB performance indicators highlighted a steadily increasing "row read" metric that matched the CPU curve at 80% traffic, indicating it as a likely cause.
Comparing normal peak row‑read values showed a ten‑fold increase during the test, confirming the anomaly.
SQL Location
The problematic SQL was identified by comparing test and normal snapshots; the test version read many more rows.
private TaskInstanceParam createQueryParamByEffectiveTime(TaskQueryParam queryParam) {
final TaskInstanceParam dbQueryParam = new TaskInstanceParam();
Date now = TimeTravelManager.getCurrentTime(queryParam.getUserId());
dbQueryParam.createCriteria()
.andUserIdEqualTo(queryParam.getUserId())
.andBizTypeEqualTo(queryParam.getBizType())
.andTemplateIdEqualTo(queryParam.getSubBizType())
.andEffectiveStartTimeLessThanOrEqualTo(now)
.andEffectiveEndTimeGreaterThan(now);
dbQueryParam.appendOrderByClause(OrderCondition.EFFECTIVESTARTTIME, SortType.DESC);
dbQueryParam.setPagination(1, 1);
return dbQueryParam;
}This query selects the latest task whose effective time window contains the current time. Each registration inserts a new task, so repeated queries for the same account accumulate more matching rows, causing the row‑read count to grow.
Root Cause
During the load test, each account’s registration adds a task record; the query then scans an ever‑increasing number of rows, forcing the CPU to spend more time processing each row, which explains the rising CPU usage.
Solutions
Mock the userId in queries using a pre‑selected real userId to keep the row count stable.
Keep the insertion logic unchanged, as it does not affect the query after mocking.
Principle Explanation
Row reads represent the number of rows a query must examine. High row reads increase CPU usage because the database must process each row. Logical reads (data from cache) are cheaper than physical reads (disk I/O). Optimizing indexes and SQL can reduce both.
Reflection
At 80% traffic, early CPU rise should trigger investigation to avoid 100% spikes.
Load tests should monitor a broader set of DB metrics, not just CPU.
Mocking strategies must match the actual usage pattern; repeatedly registering tasks in a flash‑sale scenario was unsuitable for this test.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
