How We Scaled Billion‑Row MySQL Tables with Simple Sharding Strategies
This article shares a practical journey of handling MySQL tables that grew beyond a billion rows, describing a temporary backup‑table fix, a custom hash‑based sharding implementation, data migration tactics, validation steps, and key lessons learned for large‑scale database management.
Background
Our production MySQL database grew rapidly; several tables exceeded a billion rows and daily data volume increased by over 200 W rows. Queries on these large tables took minutes, especially when business required joins or reporting.
Many wonder why we only considered a solution after tables reached a hundred million rows; historical reasons and under‑estimated growth led to the situation.
Temporary Fix
Under tight deadlines and limited manpower we applied a two‑step approach:
Rename the original table (e.g., add _190416bak suffix).
Create a new table with the original name.
New data writes to the new table while the old data remains untouched, providing immediate relief.
Sharding Strategy
To permanently solve the problem we needed a proper sharding plan. The key points were:
Identify a sharding field that is not strongly related to other queries. In our IoT scenario the device IMEI (a unique integer) fit perfectly.
Choose between time‑based partitioning (good for recent‑data queries) and hash‑based partitioning (suitable when all data may be queried). We opted for hash sharding.
Hash Sharding Implementation
We implemented a lightweight hash sharding layer in the JDBC code instead of using a full‑blown sharding‑jdbc framework. The core logic is: int index = hash(shardingField) % shardCount; and the routed query becomes:
select xx from 'busy_' + index where shardingField = xxx;This calculates the target table name and routes the query accordingly. The implementation modifies low‑level query methods to perform this routing without full SQL parsing or result merging.
Business Adjustments
Because we did not use a third‑party sharding component, every piece of business code that accessed sharded tables required modification. We decided to split the tables into 64 shards (a power of two) to simplify modulo calculations and future growth.
Note: The number of shards should be a power of two to keep the modulo operation efficient.
We also ensured each table has a sortable field (e.g., auto‑increment ID or creation timestamp) to avoid full‑table scans. For small, special‑type data that would cause pagination issues after sharding, we created separate tables.
Validation
After code changes and unit tests, we verified sharding by renaming the original tables (adding a suffix) and observing any runtime errors during testing.
Deployment & Data Migration
Before going live we migrated existing data into the 64 new shards. The migration program copied rows according to the hash rule. Lack of an index on create_time made the migration slow, and we had to inform users that recent historical data might be temporarily unavailable.
Conclusion
Key takeaways:
A solid product plan is essential for timely data handling (archiving or sharding).
Every table should have a sortable field to enable efficient range queries.
Choose the sharding field carefully to minimize full‑table scans.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
