Databases 19 min read

Why Intel’s PAUSE Instruction Slows MySQL on Skylake CPUs and How to Fix It

This article examines how the increased latency of Intel's PAUSE instruction on Skylake CPUs creates a performance bottleneck for MySQL, details the profiling steps that identified ut_delay as the hotspot, and presents a series of optimizations—including spin‑wait parameter tuning and a MySQL 8.0 patch—that restore throughput on affected hardware.

dbaplus Community
dbaplus Community
dbaplus Community
Why Intel’s PAUSE Instruction Slows MySQL on Skylake CPUs and How to Fix It

Background

MySQL has become the de‑facto relational database for many internet services due to its open‑source nature, mature ecosystem, and continuous feature improvements. In 2019 Meituan began deploying MySQL on Intel Purley‑based Skylake servers (e.g., Silver 4110), expecting a 10% performance gain over the previous E5‑2620 V4 generation.

However, as the number of Skylake servers grew, the DBA team observed higher CPU load and reduced TPS on several instances.

Performance Issue Analysis

Benchmarking across Grantly (E‑series) and Purley (Skylake) platforms revealed that in the oltp_write_only workload the Skylake 4110 performed noticeably worse, especially on CentOS 7 compared to CentOS 6.

Further investigation focused on three areas: Intel CPU characteristics, the ut_delay function, and the PAUSE instruction.

CPU Performance Tracing

Using perf top, perf record, and flame graphs, the team pinpointed ut_delay as the function consuming the majority of CPU cycles.

Flame‑graph snapshots (shown below) illustrate that ut_delay dominates the call stack.

The ut_delay implementation uses a spin‑wait loop that executes the PAUSE instruction many times:

#define UT_RELAX_CPU() asm("pause")
ulint ut_delay(ulint delay) {
  ulint i, j = 0;
  UT_LOW_PRIORITY_CPU();
  for (i = 0; i < delay * 50; i++) {
    j += i;
    UT_RELAX_CPU();
  }
  UT_RESUME_PRIORITY_CPU();
  return j;
}

Intel’s documentation shows that the PAUSE latency grew from ~10 cycles on older micro‑architectures to up to 140 cycles on Skylake, causing longer spin‑wait periods.

PAUSE and ut_delay Impact

The increased PAUSE latency directly inflates the execution time of ut_delay, which in turn reduces MySQL’s overall throughput, especially for write‑heavy workloads that rely on InnoDB SX locks.

Two key factors were identified:

Spin‑wait duration in MySQL 5.7 is calculated as spin_wait_delay * 50.

The CPU‑specific PAUSE instruction cycle count.

Optimization Exploration

1. MySQL 5.7 Spin Parameter Tuning

The DBA team adjusted innodb_spin_wait_delay (default 6) and innodb_sync_spin_loops (default 30). Benchmarks showed modest TPS/QPS improvements when innodb_spin_wait_delay was reduced, but gains were insufficient for production needs.

2. Porting MySQL 8.0 spin_wait_pause_multiplier

MySQL 8.0 introduces the spin_wait_pause_multiplier variable, replacing the hard‑coded loop count (50) with a configurable multiplier. The relevant code change is:

ulint ut_delay(ulint delay) {
  const ulint iterations = delay * ut::spin_wait_pause_multiplier;
  UT_LOW_PRIORITY_CPU();
  for (ulint i = 0; i < iterations; i++) {
    j += i;
    UT_RELAX_CPU();
  }
  UT_RESUME_PRIORITY_CPU();
  return j;
}
namespace ut { ulong spin_wait_pause_multiplier = 50; }

By back‑porting this patch to the stable 5.7 branch and setting the multiplier to 5 (approximately 1/14 of the default, matching the observed PAUSE cycle increase on Silver 4110), the team achieved a substantial performance uplift.

Benchmark graphs demonstrate that the patched 5.7 version on Silver 4110 outperforms the unpatched version and even exceeds the older E5‑2620 V4 in most scenarios, except for extreme write‑only workloads with >64 concurrent threads.

3. CPU‑Level PAUSE Optimization

Testing on Cascadelake (e.g., 4210) showed the PAUSE latency reduced to 44 cycles, yielding an 8% lower ut_delay overhead compared to Skylake 4110. Performance under 128 threads improved, confirming the benefit of newer micro‑architectures.

Conclusion

Intel’s decision to increase PAUSE instruction latency on Skylake CPUs can severely degrade MySQL performance in spin‑lock‑heavy workloads. Effective mitigation strategies include:

Porting or upgrading to MySQL 8.0 and lowering innodb_spin_wait_pause_multiplier to match the actual PAUSE cycle count.

Fine‑tuning innodb_spin_wait_delay (though gains are limited).

Upgrading the operating system to CentOS 7, which offers improved spin‑lock handling.

Replacing Skylake CPUs with newer generations (Cascadelake or later) where PAUSE latency has been reduced.

Applying the patch and multiplier adjustment restored throughput on Silver 4110 to near‑original levels, demonstrating that software‑level tuning can compensate for hardware‑level latency increases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance tuningmysqlCPU performanceSpinlockPAUSE instruction
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.