Databases 18 min read

Analyzing and Optimizing MySQL Performance on Intel Skylake CPUs

Meituan’s DBA team discovered that Intel Skylake CPUs dramatically increased PAUSE‑instruction latency, causing MySQL’s ut_delay spin‑wait loops to consume most CPU cycles and reduce write throughput, and they restored performance by back‑porting MySQL 8.0’s spin_wait_pause_multiplier patch to 5.7, upgrading to CentOS 7, and moving to Cascadelake hardware.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Analyzing and Optimizing MySQL Performance on Intel Skylake CPUs

MySQL, due to its open‑source nature and mature ecosystem, is the de‑facto relational database for the Internet. This article shares a practical case from Meituan’s MySQL DBA team, where the evolution of the Intel PAUSE instruction caused a performance bottleneck on Skylake‑based servers.

1. Background – In 2017 Intel released the Purley platform with Xeon Scalable processors (Platinum, Gold, Silver, Bronze). Meituan migrated many MySQL instances to Purley/Skylake servers (e.g., Silver 4110). Although the new CPUs promised ~10 % higher raw performance, the team observed higher CPU load and reduced TPS.

2. Performance problem analysis – Benchmarking showed that on OLTP write‑only workloads the Purley 4110 performed worse than older platforms, and that CentOS 7 gave a modest improvement over CentOS 6.

3. CPU performance tracing

3.1 Hot‑spot identification – Using perf top, perf record and flame graphs, the team pinpointed ut_delay as the function consuming the most CPU cycles.

# Children      Self  Command  Shared Object        Symbol
93.54%      0.00% mysqld   libpthread-2.17.so   [.] start_thread
    |---start_thread
    |--77.07%--pfs_spawn_thread
    |    |--77.05%--handle_connection
    |        |--76.97%--do_command
    |            |--74.30%--dispatch_command
    |                |--71.16%--mysqld_stmt_execute
    |                    |--70.74%--Prepared_statement::execute_loop
    |                        |--69.53%--Prepared_statement::execute
    |                            |--67.90%--mysql_execute_command
    |                                |--23.43%--trans_commit_stmt
    |                                    |--23.30%--ha_commit_trans
    |                                        |--18.86%--MYSQL_BIN_LOG::commit
    |                                            |--18.18%--MYSQL_BIN_LOG::ordered_commit
    |                                                |--8.02%--MYSQL_BIN_LOG::change_stage
    |                                                    |--2.35%--__lll_unlock_wake
    |                                                        |--2.24%--system_call_fastpath
    |                                                            |--2.23%--do_futex
    |                                                                |--1.38%--wake_up_q
    |                                                                    |--1.33%--try_to_wake_up

3.2 Relationship between ut_delay and the PAUSE instruction – MySQL implements spin‑wait loops with the PAUSE instruction. Intel’s documentation shows that PAUSE latency grew from ~10 cycles on previous generations to up to 140 cycles on Skylake, which directly inflates the time spent in ut_delay.

3.3 Impact on write throughput – InnoDB’s SX lock causes many threads to invoke ut_delay, leading to noticeable write stalls.

if (flush_type == BUF_FLUSH_LIST && is_uncompressed && !rw_lock_sx_lock_nowait(rw_lock, BUF_IO_WRITE)) {
    // lock contention handling
    if (!fsp_is_system_temporary(bpage->id.space())) {
        buf_dblwr_flush_buffered_writes(buf_parallel_dblwr_partition(bpage, flush_type));
    } else {
        buf_dblwr_sync_datafiles();
    }
    rw_lock_sx_lock_gen(rw_lock, BUF_IO_WRITE); // acquire SX lock
}

4. Optimization exploration

4.1 MySQL 5.7 spin parameters – Adjusting innodb_spin_wait_delay and innodb_sync_spin_loops showed limited improvement; smaller values reduced latency but could not meet production requirements.

4.2 MySQL 8.0 spin‑wait multiplier – MySQL 8.0 introduces spin_wait_pause_multiplier, replacing the hard‑coded 50‑iteration loop. The team back‑ported this patch to the stable 5.7 branch and set the multiplier to 5 (≈1/14 of the original value for Silver 4110).

ulint ut_delay(ulint delay) {
    const ulint iterations = delay * ut::spin_wait_pause_multiplier;
    UT_LOW_PRIORITY_CPU();
    ulint j = 0;
    for (ulint i = 0; i < iterations; i++) {
        j += i;
        UT_RELAX_CPU(); // PAUSE
    }
    UT_RESUME_PRIORITY_CPU();
    return j;
}
namespace ut {
    ulong spin_wait_pause_multiplier = 50;
}

4.3 CPU‑level PAUSE reduction – Cascadelake (second‑generation Purley) reduced PAUSE latency to ~44 cycles, which the team confirmed with perf diff measurements.

5. Summary

Back‑porting the spin_wait_pause_multiplier patch to 5.7 (or upgrading to MySQL 8.0) effectively lowers PAUSE execution time and restores throughput on Skylake servers.

Upgrading the OS from CentOS 6 to CentOS 7 also yields modest spin‑lock improvements.

Replacing Skylake CPUs with Cascadelake hardware provides an additional performance boost because of the reduced PAUSE latency.

The findings illustrate how CPU micro‑architectural changes (PAUSE latency) interact with database spin‑wait mechanisms and how targeted software patches can mitigate the impact.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

optimizationdatabaseLinuxmysqlCPUpause
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.