Analyzing and Optimizing MySQL Performance on Intel Skylake CPUs
Meituan’s DBA team discovered that Intel Skylake CPUs dramatically increased PAUSE‑instruction latency, causing MySQL’s ut_delay spin‑wait loops to consume most CPU cycles and reduce write throughput, and they restored performance by back‑porting MySQL 8.0’s spin_wait_pause_multiplier patch to 5.7, upgrading to CentOS 7, and moving to Cascadelake hardware.
MySQL, due to its open‑source nature and mature ecosystem, is the de‑facto relational database for the Internet. This article shares a practical case from Meituan’s MySQL DBA team, where the evolution of the Intel PAUSE instruction caused a performance bottleneck on Skylake‑based servers.
1. Background – In 2017 Intel released the Purley platform with Xeon Scalable processors (Platinum, Gold, Silver, Bronze). Meituan migrated many MySQL instances to Purley/Skylake servers (e.g., Silver 4110). Although the new CPUs promised ~10 % higher raw performance, the team observed higher CPU load and reduced TPS.
2. Performance problem analysis – Benchmarking showed that on OLTP write‑only workloads the Purley 4110 performed worse than older platforms, and that CentOS 7 gave a modest improvement over CentOS 6.
3. CPU performance tracing
3.1 Hot‑spot identification – Using perf top, perf record and flame graphs, the team pinpointed ut_delay as the function consuming the most CPU cycles.
# Children Self Command Shared Object Symbol
93.54% 0.00% mysqld libpthread-2.17.so [.] start_thread
|---start_thread
|--77.07%--pfs_spawn_thread
| |--77.05%--handle_connection
| |--76.97%--do_command
| |--74.30%--dispatch_command
| |--71.16%--mysqld_stmt_execute
| |--70.74%--Prepared_statement::execute_loop
| |--69.53%--Prepared_statement::execute
| |--67.90%--mysql_execute_command
| |--23.43%--trans_commit_stmt
| |--23.30%--ha_commit_trans
| |--18.86%--MYSQL_BIN_LOG::commit
| |--18.18%--MYSQL_BIN_LOG::ordered_commit
| |--8.02%--MYSQL_BIN_LOG::change_stage
| |--2.35%--__lll_unlock_wake
| |--2.24%--system_call_fastpath
| |--2.23%--do_futex
| |--1.38%--wake_up_q
| |--1.33%--try_to_wake_up3.2 Relationship between ut_delay and the PAUSE instruction – MySQL implements spin‑wait loops with the PAUSE instruction. Intel’s documentation shows that PAUSE latency grew from ~10 cycles on previous generations to up to 140 cycles on Skylake, which directly inflates the time spent in ut_delay.
3.3 Impact on write throughput – InnoDB’s SX lock causes many threads to invoke ut_delay, leading to noticeable write stalls.
if (flush_type == BUF_FLUSH_LIST && is_uncompressed && !rw_lock_sx_lock_nowait(rw_lock, BUF_IO_WRITE)) {
// lock contention handling
if (!fsp_is_system_temporary(bpage->id.space())) {
buf_dblwr_flush_buffered_writes(buf_parallel_dblwr_partition(bpage, flush_type));
} else {
buf_dblwr_sync_datafiles();
}
rw_lock_sx_lock_gen(rw_lock, BUF_IO_WRITE); // acquire SX lock
}4. Optimization exploration
4.1 MySQL 5.7 spin parameters – Adjusting innodb_spin_wait_delay and innodb_sync_spin_loops showed limited improvement; smaller values reduced latency but could not meet production requirements.
4.2 MySQL 8.0 spin‑wait multiplier – MySQL 8.0 introduces spin_wait_pause_multiplier, replacing the hard‑coded 50‑iteration loop. The team back‑ported this patch to the stable 5.7 branch and set the multiplier to 5 (≈1/14 of the original value for Silver 4110).
ulint ut_delay(ulint delay) {
const ulint iterations = delay * ut::spin_wait_pause_multiplier;
UT_LOW_PRIORITY_CPU();
ulint j = 0;
for (ulint i = 0; i < iterations; i++) {
j += i;
UT_RELAX_CPU(); // PAUSE
}
UT_RESUME_PRIORITY_CPU();
return j;
}
namespace ut {
ulong spin_wait_pause_multiplier = 50;
}4.3 CPU‑level PAUSE reduction – Cascadelake (second‑generation Purley) reduced PAUSE latency to ~44 cycles, which the team confirmed with perf diff measurements.
5. Summary
Back‑porting the spin_wait_pause_multiplier patch to 5.7 (or upgrading to MySQL 8.0) effectively lowers PAUSE execution time and restores throughput on Skylake servers.
Upgrading the OS from CentOS 6 to CentOS 7 also yields modest spin‑lock improvements.
Replacing Skylake CPUs with Cascadelake hardware provides an additional performance boost because of the reduced PAUSE latency.
The findings illustrate how CPU micro‑architectural changes (PAUSE latency) interact with database spin‑wait mechanisms and how targeted software patches can mitigate the impact.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
