Understanding MySQL Dump Crashes, GTID‑Purged Bugs, and TokuDB Optimize Anomalies
This article analyzes why redirecting MySQLdump output can crash the client, explores a GTID_PURGED bug that breaks AUTO_POSITION replication, examines GTID gaps caused by replicate‑do‑db filtering, and explains why TokuDB's OPTIMIZE TABLE appears to increase table size.
MySQL Dump Crash caused by stderr Redirection
The client crashed when a large dump (~50 GB) was imported with mysql -e 'source test.dmp' because the dump file began with a warning line generated by redirecting both stdout and stderr to the dump file:
mysqldump ... > /test.dmp 2>&1
The warning line is not valid SQL; when the client reads the first line it encounters an unmatched single quote, causing the parser to keep allocating memory until it exceeds the limit and crashes.
Key technical limits:
Maximum batch buffer size defined in MySQL.cc is #define MAX_BATCH_BUFFER_SIZE (1024L * 1024L * 1024L) (1 GB).
The max_allowed_packet variable caps the size of CLOB columns (between 1 KB and 1 GB in MySQL 5.5).
When mysqldump creates INSERT statements, it splits them according to opt_net_buffer_length; if a single row exceeds this buffer, it forces a line break.
In normal configurations the limits of max_allowed_packet and MAX_BATCH_BUFFER_SIZE align, preventing crashes.
GTID‑PURGED Bug that Breaks AUTO_POSITION Replication
In MySQL 5.6.22 a bug allows a primary server to accept a SET GLOBAL GTID_PURGED='…' that contains GTIDs not yet replicated to a replica. The primary silently sends those GTIDs to the replica, making the replica appear consistent while actually missing transactions.
Replication uses two GTID sets: Retrieved_Gtid_Set – GTIDs already fetched by the replica. Executed_Gtid_Set – GTIDs already executed.
During binlog scanning the primary calls find_first_log_not_in_gtid_set(slave_gtid_executed) to locate the first binlog containing a GTID not present on the replica. The bug occurs when the newly created binlog (B) after SET GLOBAL GTID_PURGED has an empty Previous_gtids_log_event, causing the scan to stop at the first binlog (A) without error.
The official fix adds a pre‑scan check: if the GTID_PURGED set is larger than the replica's executed set, the server aborts with error 1236, matching the second stop condition.
Replication‑Do‑DB Filtering and GTID Continuity Issues
When a large MySQL instance is split across multiple servers, replicate‑do‑db filters binlog events so that each replica only replays statements for its assigned databases. Because the primary records GTID events for all databases, the replica’s Executed_Gtid_Set becomes fragmented, showing many gaps.
Consequences:
Long, unreadable GTID lists in SHOW SLAVE STATUS.
If the primary later purges old binlogs, the replica may lose required GTIDs and the I/O thread will error.
A mitigation is to have the replica record an empty transaction for filtered events, preserving GTID continuity. A patch (revno 5860) implements this for statement‑based binlog events; row‑based events already behave this way via check_table_map. The patch also ensures that CREATE/DROP TEMPORARY TABLE statements generate empty transactions when the replica uses row format.
TokuDB Optimize Table Size Growth Explained
After converting a MyISAM table to TokuDB, repeated OPTIMIZE TABLE commands caused the underlying index files to grow (e.g., from 47 MB to 79 MB). TokuDB writes dirty pages to the end of the file rather than overwriting existing blocks, leaving fragments.
During checkpoints these fragments are reclaimed and added to a free‑list, so the apparent growth does not indicate a leak. In practice, OPTIMIZE TABLE provides little benefit for TokuDB because the engine already maintains a “no fragmentation” design.
Key takeaways:
During optimization TokuDB flushes internal buffers with toku_ft_flush_some_child, moving data from internal nodes to leaf nodes.
Running OPTIMIZE TABLE on TokuDB is generally unnecessary.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
