Mastering Database Sharding: Vertical & Horizontal Partitioning Strategies
This article explains the core concepts of database sharding, detailing vertical and horizontal partitioning methods, strategies for combining them, handling shared data, and addressing challenges such as distributed transactions, cross‑node joins, and aggregation queries, with practical insights for scalable system design.
1. Basic Idea
Sharding’s basic idea is to split a database into multiple parts placed on different servers, alleviating the performance bottleneck of a single database. For massive data, vertical partitioning separates closely related tables (e.g., same module) onto one server, while horizontal partitioning distributes rows of a single large table across multiple servers based on a rule such as ID hashing. In practice, both approaches are often combined, creating a matrix‑like array of shards that can be expanded indefinitely.
Vertical partitioning is simple and convenient, especially when business modules have low coupling and clear logic, allowing tables of different modules to be placed in separate databases with minimal impact on applications (the “share nothing” principle).
Horizontal partitioning is more complex because it splits rows of the same table across different databases, making the partitioning rules and later data maintenance more intricate.
2. Partitioning Strategy
The typical approach is to perform vertical partitioning first, which prepares the ground for horizontal partitioning. Vertical partitioning groups tables with strong aggregation relationships (often a domain‑driven “aggregate”). Within each vertical shard, identify the aggregate root and horizontally partition based on that root, placing all related rows into the same shard. This minimizes cross‑shard relationships. For example, in a social network, user‑centric sharding is natural; in a forum, the Forum aggregate can be the horizontal shard key.
Read‑only dictionary tables can be duplicated in each shard to avoid breaking relationships. For mutable shared data, cross‑shard joins must be broken.
When both vertical and horizontal partitioning are applied, the strategy shifts: vertical groups can no longer be defined solely by functional modules; they must align with fine‑grained aggregates, matching domain‑driven design concepts.
1. Transaction Issues:
Two viable solutions exist for handling transactions across shards:
Solution 1: Use distributed transactions managed by the database.
Advantages: Simple and effective, handled by the DBMS
Disadvantages: High performance cost, especially as the number of shards growsSolution 2: Let the application and database jointly control transactions.
Principle: Split a distributed transaction into multiple single‑database sub‑transactions and coordinate them in the application.
Advantages: Better performance
Disadvantages: Requires flexible transaction control in the application; integrating with frameworks like Spring can be challenging.2. Cross‑Node Join Issues
Cross‑node joins are inevitable after sharding, but good design can reduce their occurrence. A common workaround is a two‑step query: first retrieve the IDs of related data, then issue a second query to fetch the actual records.
3. Cross‑Node count, order by, group by, and aggregation issues
These operations need to process the entire data set, which most proxies cannot merge automatically. The typical solution mirrors the cross‑node join approach: compute results on each shard in parallel, then merge them in the application. While this can be faster than querying a single large table, large result sets may consume significant application memory.
Source article: http://blog.csdn.net/huaweitman/article/details/50560089
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
