Databases 24 min read

Why MyCat’s Pseudo‑Distributed MySQL Solution Fails and What to Do Instead

The article examines MyCat’s middleware‑based pseudo‑distributed MySQL architecture, outlines its storage, scalability, and reliability shortcomings, walks through common solutions like disk expansion, compression, and sharding, and finally offers practical steps and alternative technologies for building truly distributed database systems.

dbaplus Community
dbaplus Community
dbaplus Community
Why MyCat’s Pseudo‑Distributed MySQL Solution Fails and What to Do Instead

Background

Distributed databases are rapidly evolving to meet the ever‑growing data volume and transaction load of modern internet services. Traditional MySQL, being a single‑node database, inevitably hits storage bottlenecks when data scales beyond a few terabytes.

Three Basic Ways to Address Storage Limits

Increase Disk Capacity – Adding more disks (e.g., from 800 GB to 2 TB or 5 TB) is the simplest fix, but it raises operational concerns such as backup, recovery time, and DBA workload for massive instances.

Data Compression – InnoDB’s native compression can reduce storage to one‑third or half of the original size, at the cost of some performance degradation, especially for latency‑sensitive workloads.

Data Sharding – Splitting data across multiple MySQL instances (or other stores like HBase, Redis) provides the most scalable solution, though it introduces complexity in routing, metadata management, and consistency.

Requirements for a Distributed Solution

Scalability – Ability to add nodes without affecting existing services.

Transactional Support – Distributed transactions must be preserved.

Full SQL Compatibility – Applications should continue to use standard MySQL statements.

Performance – Overhead of distribution should be minimal.

Metadata Change Transparency – Schema changes must propagate safely across shards.

Underlying DB High Availability – The base MySQL cluster must guarantee consistency and failover.

Popular Middleware: MyCat

MyCat is a widely discussed MySQL middleware that claims to provide automatic sharding, aggregation, and load balancing. Its architecture is illustrated below:

MyCat architecture diagram
MyCat architecture diagram

Despite the appealing diagram, several critical issues arise:

Routing Logic – MyCat relies on a static schema.xml file to map tables to shards, making dynamic routing and schema evolution cumbersome.

Rebalancing – Adding nodes typically requires manual data export/import and a reload of the configuration, a process that can overwhelm DBAs.

Global Tables – Implemented by creating identical tables on every shard, which raises consistency and performance concerns as shard count grows.

Distributed Transactions – MyCat depends on MySQL XA, which is rarely used in production due to performance and reliability issues.

Failover – Automatic backend failover is limited to the node’s own view, risking split‑brain scenarios.

Backup & Recovery – Each shard must be backed up individually; restoring a consistent snapshot across all shards is non‑trivial.

Configuration Complexity – The XML configuration is verbose and error‑prone, discouraging adoption.

Practical Migration Steps if MyCat Becomes Too Risky

Stop all write traffic.

Export all databases using logical tools such as mysqldump to generate .sql files.

Choose a robust MySQL architecture (e.g., a true distributed database or a shared‑nothing cluster) and import the dumps.

Migrate read traffic to the new system.

Finally, migrate write traffic and bring the new cluster online.

This process can take days for large datasets due to the inherent slowness of logical backups.

Alternative Distributed Database Solutions

For truly distributed, MySQL‑compatible systems, consider mature products such as Google Spanner, F1, TiDB, or SequoiaDB, which provide native sharding, strong consistency, distributed transactions, and minimal application changes.

Conclusion

MyCat remains popular because it is open source and free, but its pseudo‑distributed nature introduces many operational pitfalls. Organizations should evaluate whether a genuine distributed database better fits their scalability, reliability, and performance requirements.

divider
divider
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingmiddlewareDatabase Architecturedistributed databasemysqlMycat
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.