Databases 14 min read

Root Cause Analysis of MySQL Memory Spike Caused by Excessive dentry Allocation from Mis‑configured yum makecache Cron

The article details a MySQL memory‑usage incident at Qunar where abnormal slab memory, especially dentry allocation, caused a rapid increase in used memory after a mis‑configured yum makecache cron job, and explains the investigation steps, Linux memory concepts, diagnostic commands, and the corrective actions taken.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Root Cause Analysis of MySQL Memory Spike Caused by Excessive dentry Allocation from Mis‑configured yum makecache Cron

Author : Gao Wenjia, joined Qunar DBA team in September 2021, responsible for hotel and payment business database management and operations, with many years of DB ops experience.

1. Background

During routine MySQL server inspection or fault diagnosis, operators first examine OS‑level CPU, memory, storage, and network metrics. High kernel‑mode CPU usage (sy) often indicates heavy DB concurrency causing frequent context switches. Qunar deploys MySQL using dedicated servers with a single‑machine multi‑instance architecture, tightly controlling memory via innodb_buffer_pool_size and PXC gcache.size. Consequently, memory‑usage alerts are rare, but a sudden surge in memory‑usage alarms severely impacted service stability, prompting the creation of a custom alarm dashboard to view recent alarm ratios.

2. Problem Analysis

We monitor four memory metrics: used, cache, bufferef, and free. When a memory‑usage alarm fires, we initially clear caches using:

# 刷新磁盘
sync
# 删除Inode和dentries和pagecache
echo 3 > /proc/sys/vm/drop_caches

After dropping caches, free memory and usage percentages return to normal, but a few days later the server again triggers memory‑usage alarms, with used physical memory increasing at roughly 3 GB per day and free memory decreasing at the same rate.

To identify what drop_caches actually cleared, we examined /proc/meminfo and /proc/slabinfo:

# 查看所有进程使用的总物理内存(包含共享物理内存)
grep Pss /proc/[1-9]*/smaps | awk '{total+=$2}; END {printf "%4.2f GB
", total/1024/1024 }'
# 查看 /proc/meminfo
cat /proc/meminfo
# 输出结果
MemTotal:       132030344 kB
MemFree:         1396884 kB
Buffers:          409812 kB
Cached:         53136072 kB
Slab:           16681824 kB
SReclaimable:   16592540 kB
SUnreclaim:        89284 kB

Buffers are temporary storage for raw disk blocks; Cache stores filesystem pages to avoid repeated disk reads; Slab is a kernel memory allocator for frequently allocated objects. The server showed an unusually high Slab usage of 16.6 GB, which is abnormal for MySQL instances.

Using slabtop we found the top slab type:

## 执行命令获取内存使用Top 10类型
slabtop --sort=c -o |head -n 17
Active / Total Objects (% used)    : 87179810 / 89818826 (97.1%)
Active / Total Slabs (% used)      : 4166722 / 4166821 (100.0%)
Active / Total Caches (% used)     : 105 / 184 (57.1%)
Active / Total Size (% used)       : 15364477.91K / 15662436.88K (98.1%)
... 
73443720    73407245    99%         0.19K       3672186         20      14688744K       dentry
...

The dentry slab consumed about 14.6 GB. In Linux, every file and socket is represented by an inode and a directory entry (dentry). The dentry cache holds directory entries in memory.

3. Page‑Cache Analysis

On Linux kernel ≥ 4.1 we can use BCC tools cachestat and cachetop; on older kernels tools like hcache or vmtouch are used. Example with vmtouch shows that MySQL ib_logfile and binlog files occupy most of the page cache:

## 查看MySQL数据目录下ib_logfile文件缓存情况
/vmtouch -v /mysql_xxxx/data/ib_logfile*
... Resident Pages: 1048576/1048576 4G/4G 100%
## 查看MySQL数据目录下binlog文件缓存情况
./vmtouch -v /mysql_xxxx/binlog/
... Resident Pages: 4922439/5315467 18G/20G 92.6%

4. Directory‑Entry Investigation

Using slabtop we identified dentry as the main contributor to the memory growth. A simple script was used to log dentry statistics every second:

while true; do
    date_str=`date "+%Y-%m-%d %H:%M:%S"`;
    memory_info=`cat /proc/meminfo |grep SReclaimable`;
    dentry_info=`cat /proc/slabinfo  |grep dentry`;
    echo "${date_str}    ${memory_info}";
    echo "${date_str}    ${dentry_info}";
    sleep 1;
 done

The log showed dentry memory increasing by ~2 MB each minute. Further analysis of the log with SystemTap scripts revealed that the yum process performed many d_alloc operations without matching d_free calls.

## dentry.stp
probe kernel.function("d_alloc") {
    printf("%s[%ld] %s %s
", execname(), pid(), pp(), probefunc())
}
probe kernel.function("d_free") {
    printf("%s[%ld] %s %s
", execname(), pid(), pp(), probefunc())
}
probe timer.s(5) { exit() }

Running the script and summarising the log gave:

15049 BackgrProcPool[13267]-->d_alloc
14707 BackgrProcPool[13267]-->d_free
8367 yum[8606]-->d_alloc
...

The yum process was identified as the main source of excessive dentry allocations. The underlying cause was a cron job that executed yum makecache every minute instead of the intended hourly schedule:

* */1 * * * root yum makecache --enablerepo=xxxxxx_repo;yum -q -y update xxxxxx --enablerepo=xxxxxx_repo;

The mis‑configuration increased the execution frequency by 60×, rapidly inflating dentry usage.

5. Solution

The immediate fix was to restore the correct cron schedule (once per hour) for yum makecache. Long‑term measures include removing the dependency on frequent yum makecache by adopting proactive package push mechanisms and tightening operational procedures with pre‑review, in‑process checks, and post‑validation to reduce human error.

6. References

Linux Performance Optimization Practice

SystemTap script analysis of high dentry SLAB usage

Basic analysis of kernel memory leaks

END

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Memory ManagementLinuxMySQLDentrySLAB
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.