Root Cause Analysis of MySQL Crash Triggered by Binlog Errors When the Root Partition Is Full
The article investigates a MySQL crash caused by binlog errors due to a full '/' partition, explains why the error leads to server abort, demonstrates reproducing the issue with large transactions, traces the problem to the my_write function in the source code, and offers mitigation strategies such as reducing transaction size or expanding the temporary directory space.
Problem Phenomenon – In a production project MySQL crashed with a "MySQL Crash" message. The error log showed a binlog error and the system log indicated that the '/' partition was out of space, confirming that the binlog error resulted from a full disk.
Questions – Why does a binlog error caused by a full '/' partition lead to a MySQL crash? Why does the partition appear to have plenty of space later, and which files actually consume the space?
Investigation of MySQL Parameters – Two key parameters are relevant: binlog_cache_size=32768 (default 32 KB) and binlog_error_action=ABORT_SERVER . When a transaction exceeds binlog_cache_size , MySQL creates a temporary file in /tmp (which resides on the '/' partition). If the binlog write fails, binlog_error_action aborts the server, causing the crash.
Test Simulation – A local environment was set up with the following filesystem layout:
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cl-root 96G 57G 39G 60% / <-- datadir = /data/mysql_data
/dev/mapper/cl-tmp 3.8G 33M 3.8G 1% /data/tmp <-- tmpdir = /data/tmpA massive transaction (loop inserting 10 000 rows) was executed. The temporary files grew in /data/tmp , eventually exhausting the space. The lsof command confirmed the temporary files were being created and growing.
During the test, the following MySQL errors were observed:
ERROR 1026 (HY000): Error writing file '/data/tmp/ML1aShmH' (errno: 28 - No space left on device)
ERROR 1598 (HY000): Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.When the same situation occurred in a single session, committing the transaction or starting a new one immediately triggered binlog_error_action , causing the server to abort and logging the binlog error.
Source Code Tracing – The failure originates in the my_write function, which ultimately calls the Linux write() system call. When the write returns fewer bytes than requested (due to ENOSPC), MySQL logs the error and returns MY_FILE_ERROR , which propagates up to the binlog commit routine and triggers the abort action.
size_t my_write(File Filedes, const uchar *Buffer, size_t Count, myf MyFlags) {
size_t writtenbytes;
// ...
writtenbytes = write(Filedes, Buffer, Count);
if (writtenbytes == Count) {
// success
}
// error handling
my_error(EE_WRITE, MYF(0), my_filename(Filedes), my_errno(), my_strerror(errbuf, sizeof(errbuf), my_errno()));
DBUG_RETURN(MY_FILE_ERROR);
}The trace logs show the exact call stack leading to my_write and the subsequent error messages, confirming that the binlog flush fails because the temporary directory cannot accommodate the data.
Extended Scenario – When using Navicat to restore a large database, the tool also creates a transaction that exceeds binlog_cache_size . If /tmp runs out of space, Navicat reports an error but does not issue a COMMIT , so the server does not crash; the connection is simply closed.
Summary and Recommendations – The crash occurs when a large transaction exceeds binlog_cache_size , causing temporary files to fill the /tmp partition. Increasing binlog_cache_size is not a viable solution because it would dramatically increase memory usage (e.g., 32 MB per connection for 300 connections ≈ 10 GB). The proper mitigation is to reduce transaction size, avoid generating many temporary files concurrently, and enlarge the partition that holds tmpdir .
Open Issues – The problem reproduces on CentOS 7.3 (commit leads to crash) but not on CentOS 7.6 (commit does not crash, though binlog events are partially written).
Appendix
51CTO Blog post
MySQL Reference Manual – Binary Log Options
Stack Overflow discussion on binlog cache size
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.