Why Intel’s PAUSE Instruction Slows MySQL on Skylake CPUs and How to Fix It
This article examines how the increased latency of Intel's PAUSE instruction on Skylake CPUs creates a performance bottleneck for MySQL, details the profiling steps that identified ut_delay as the hotspot, and presents a series of optimizations—including spin‑wait parameter tuning and a MySQL 8.0 patch—that restore throughput on affected hardware.
Background
MySQL has become the de‑facto relational database for many internet services due to its open‑source nature, mature ecosystem, and continuous feature improvements. In 2019 Meituan began deploying MySQL on Intel Purley‑based Skylake servers (e.g., Silver 4110), expecting a 10% performance gain over the previous E5‑2620 V4 generation.
However, as the number of Skylake servers grew, the DBA team observed higher CPU load and reduced TPS on several instances.
Performance Issue Analysis
Benchmarking across Grantly (E‑series) and Purley (Skylake) platforms revealed that in the oltp_write_only workload the Skylake 4110 performed noticeably worse, especially on CentOS 7 compared to CentOS 6.
Further investigation focused on three areas: Intel CPU characteristics, the ut_delay function, and the PAUSE instruction.
CPU Performance Tracing
Using perf top, perf record, and flame graphs, the team pinpointed ut_delay as the function consuming the majority of CPU cycles.
Flame‑graph snapshots (shown below) illustrate that ut_delay dominates the call stack.
The ut_delay implementation uses a spin‑wait loop that executes the PAUSE instruction many times:
#define UT_RELAX_CPU() asm("pause")
ulint ut_delay(ulint delay) {
ulint i, j = 0;
UT_LOW_PRIORITY_CPU();
for (i = 0; i < delay * 50; i++) {
j += i;
UT_RELAX_CPU();
}
UT_RESUME_PRIORITY_CPU();
return j;
}Intel’s documentation shows that the PAUSE latency grew from ~10 cycles on older micro‑architectures to up to 140 cycles on Skylake, causing longer spin‑wait periods.
PAUSE and ut_delay Impact
The increased PAUSE latency directly inflates the execution time of ut_delay, which in turn reduces MySQL’s overall throughput, especially for write‑heavy workloads that rely on InnoDB SX locks.
Two key factors were identified:
Spin‑wait duration in MySQL 5.7 is calculated as spin_wait_delay * 50.
The CPU‑specific PAUSE instruction cycle count.
Optimization Exploration
1. MySQL 5.7 Spin Parameter Tuning
The DBA team adjusted innodb_spin_wait_delay (default 6) and innodb_sync_spin_loops (default 30). Benchmarks showed modest TPS/QPS improvements when innodb_spin_wait_delay was reduced, but gains were insufficient for production needs.
2. Porting MySQL 8.0 spin_wait_pause_multiplier
MySQL 8.0 introduces the spin_wait_pause_multiplier variable, replacing the hard‑coded loop count (50) with a configurable multiplier. The relevant code change is:
ulint ut_delay(ulint delay) {
const ulint iterations = delay * ut::spin_wait_pause_multiplier;
UT_LOW_PRIORITY_CPU();
for (ulint i = 0; i < iterations; i++) {
j += i;
UT_RELAX_CPU();
}
UT_RESUME_PRIORITY_CPU();
return j;
}
namespace ut { ulong spin_wait_pause_multiplier = 50; }By back‑porting this patch to the stable 5.7 branch and setting the multiplier to 5 (approximately 1/14 of the default, matching the observed PAUSE cycle increase on Silver 4110), the team achieved a substantial performance uplift.
Benchmark graphs demonstrate that the patched 5.7 version on Silver 4110 outperforms the unpatched version and even exceeds the older E5‑2620 V4 in most scenarios, except for extreme write‑only workloads with >64 concurrent threads.
3. CPU‑Level PAUSE Optimization
Testing on Cascadelake (e.g., 4210) showed the PAUSE latency reduced to 44 cycles, yielding an 8% lower ut_delay overhead compared to Skylake 4110. Performance under 128 threads improved, confirming the benefit of newer micro‑architectures.
Conclusion
Intel’s decision to increase PAUSE instruction latency on Skylake CPUs can severely degrade MySQL performance in spin‑lock‑heavy workloads. Effective mitigation strategies include:
Porting or upgrading to MySQL 8.0 and lowering innodb_spin_wait_pause_multiplier to match the actual PAUSE cycle count.
Fine‑tuning innodb_spin_wait_delay (though gains are limited).
Upgrading the operating system to CentOS 7, which offers improved spin‑lock handling.
Replacing Skylake CPUs with newer generations (Cascadelake or later) where PAUSE latency has been reduced.
Applying the patch and multiplier adjustment restored throughput on Silver 4110 to near‑original levels, demonstrating that software‑level tuning can compensate for hardware‑level latency increases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
