Analysis of a Percona XtraDB Cluster (PXC) Flow‑Control Incident Triggered by Disk I/O and Large TEXT Fields
This article examines a real‑world Percona XtraDB Cluster (PXC) fault where simultaneous write‑node connection spikes were caused by flow‑control triggered by 100% disk I/O on a shared read node, ultimately traced to slow sorting queries involving a mediumtext column, and offers diagnostic steps and mitigation strategies.
Introduction
PXC (Percona XtraDB Cluster) is Percona's high‑availability and scalability solution for MySQL, offering synchronous replication, strong data consistency, multi‑master writes, and automatic node configuration. While powerful, it has operational constraints that administrators must understand.
Fault Phenomenon
During a certain period, write‑node connection counts on two clusters (A on port 3307 and B on port 3308) surged simultaneously, leading to a buildup of SQL‑running threads and noticeable impact on business services.
Analysis Process
Initial checks ruled out new deployments or scheduled tasks. Monitoring of the shared read node (db3) revealed an increase in the wsrep_flow_control_sent metric, indicating that flow‑control messages were being sent.
PXC Data Synchronization Process
Write node executes a transaction.
For each row modification, galera_append_key records the database, table, and key fields.
On commit, galera_append_data adds the binlog to the write set, forming a (key,value) pair.
galera_pre_commit queues the write set for serial transmission.
Nodes wait for GTID and conflict verification.
Based on local verification, nodes either commit or discard the write set.
Flow‑Control Principle
Flow control allows any node to pause replication when its receive queue grows too large, preventing lagging nodes from falling behind. When the queue length exceeds a configured threshold, the node broadcasts a flow‑control (FC) message, blocking further writes until the queue shrinks below a release threshold.
Flow‑Control Variables and Status
gcs.fc_limit : Transaction count threshold that triggers FC.
gcs.fc_factor : Multiplier to determine the release threshold (gcs.fc_limit × gcs.fc_factor).
wsrep_local_recv_queue : Number of transactions pending on the node; exceeding gcs.fc_limit triggers FC.
wsrep_flow_control_paused_ns : Time (nanoseconds) replication is paused due to FC.
wsrep_flow_control_paused : Percentage of total replication time spent in flow control.
wsrep_flow_control_recv : Count of flow‑control events received (not useful for pinpointing the culprit).
wsrep_flow_control_sent : Count of flow‑control events sent; useful for identifying the node that caused FC.
Root Cause Investigation
Monitoring showed that during the incident the db3 server's disk I/O reached 100%, correlating with a sharp rise in wsrep_flow_control_sent . Disk‑I/O saturation slowed the read node, causing flow‑control to throttle write nodes and resulting in thread buildup.
Further inspection of slow‑query logs on cluster B revealed several concurrent sorting queries lasting over 700 seconds. The queries involved a mediumtext column ( outer_feedback_result ), which forces MySQL to use disk‑based temporary tables because the MEMORY engine cannot store TEXT/BLOB data.
When the temporary table size exceeded tmp_table_size (default 256 MiB) or when large TEXT fields were present, MySQL switched to disk‑based sorting, dramatically increasing I/O.
Mitigation and Recommendations
Monitor and alert on flow‑control metrics ( wsrep_flow_control_sent , wsrep_flow_control_paused ) and disk I/O.
Adjust wsrep_cert_deps_distance and increase wsrep_slave_threads on read nodes to improve concurrency.
Avoid queries that select large TEXT/BLOB columns unless necessary; exclude them or use covering indexes.
Consider off‑loading heavy analytical queries to a separate reporting database.
Isolate resource‑intensive workloads across different DB instances to prevent cross‑impact.
Conclusion
The incident was caused by concurrent slow sorting queries on cluster B that involved a mediumtext column, leading to disk‑based temporary tables, 100% disk I/O on the shared read node, activation of Galera flow‑control, and subsequent write‑node thread accumulation.
Key takeaways: monitor flow‑control and I/O, avoid large TEXT fields in performance‑critical queries, and isolate workloads to protect cluster stability.
References
《MySQL运维内参》, 周彦伟、王竹峰、强昌金
“Galera Flow Control in Percona XtraDB Cluster for MySQL” by Jay Janssen
MySQL 8.0 Reference Manual, Section 11.3.4 – The BLOB and TEXT Types
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.