When and How to Apply Database Sharding (Split Database and Tables)
This article explains the signs that indicate a need for database sharding, outlines performance bottlenecks, presents optimization techniques such as SQL tuning, table redesign, read‑write separation, and then details vertical and horizontal splitting strategies, their implementation across single or multiple databases, and the added complexities like cross‑database joins, distributed transactions, and ID generation.
Why Split Database and Tables?
When a database reaches a performance bottleneck—manifested as request blocking, slow SQL queries, or storage pressure—simply upgrading hardware becomes costly, so software‑level solutions like sharding are preferred.
Database Optimization Options
Optimizations are divided into software and hardware layers. Software includes SQL tuning, table redesign, read‑write separation, clustering, and sharding; hardware mainly means adding more powerful machines.
SQL Tuning
Enable slow‑query logging in MySQL:
slow_query_log=on
long_query_time=1
slow_query_log_file=/path/to/logUse EXPLAIN to check execution plans, e.g.:
select id, age, gender from user where name = '爱笑的架构师';Look for the type column; aim for range or better.
Table Structure Optimization
Redundant fields can reduce joins; for example, adding nickname to the order table avoids joining the large user table when displaying order lists.
Architectural Optimization
Scale out by adding read replicas, introduce a cache layer (e.g., Redis) to offload reads, and finally consider sharding when cache is insufficient.
Hardware Optimization
While upgrading CPU, memory, disk, or network can help early on, the cost‑benefit ratio diminishes as the system grows.
Detailed Sharding Walkthrough
Using an e‑commerce system as an example, the evolution proceeds from a single‑application single‑database monolith to multiple applications sharing one database, then to each service having its own database (splitting databases), and finally to splitting large tables.
Horizontal vs. Vertical Splitting
Vertical splitting separates columns into different tables (e.g., moving rarely used nickname and description to a detail table). Horizontal splitting distributes rows across tables, either by ID parity, time ranges, or other criteria.
Single‑Database vs. Multi‑Database Table Splitting
Horizontal shards can reside in the same database (single‑DB split) or be spread across several databases (multi‑DB split) to overcome storage limits.
Complexities Introduced by Sharding
Cross‑Database Joins: Use field redundancy, ETL‑generated global tables, or application‑level assembly.
Distributed Transactions: Replace local transactions with reliable‑message queues, two‑phase commit, or flexible transaction patterns.
Ordering, Pagination, and Function Computation: Execute these operations on each shard then merge results.
Distributed ID Generation: Options include UUID, dedicated ID tables, segment allocation, Redis, Snowflake, Baidu uid‑generator, Meituan Leaf, Didi TinyID, etc.
Multiple Data Sources: Middleware such as ShardingSphere (formerly Sharding‑JDBC) or Mycat can abstract multiple databases.
Conclusion
Before jumping to sharding, first try conventional optimizations; sharding adds significant complexity and should be adopted only when truly necessary, with careful architectural foresight.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
