Mastering Database Sharding: Strategies, Pitfalls, and Practical Tips
This article reviews the background, side effects, sharding strategies, and key considerations of database partitioning, illustrating challenges such as merge‑sort pagination, deep‑paging performance, hash‑based routing, and integration issues with Sharding‑JDBC, while offering practical solutions and best‑practice recommendations.
Background
The author describes a recent project that split two massive tables—orders and coupons—across the entire group of subsidiaries. Rapid business growth pushed storage, write, and read demands to critical levels, prompting a horizontal logical‑DB sharding approach to support three to five years of growth.
Side Effects of Sharding
Sharding introduces architectural complexity, especially when selecting an effective sharding key. Good keys (e.g., user_id, order_type, merchant_id, channel_id, batch_id) align with business boundaries and improve performance; poor keys require extensive scaffolding and increase system complexity. Additional challenges include sorting and pagination across shards.
1. Merge‑Sort Pagination
When data is distributed across multiple shards, each shard returns ordered subsets, but the combined result set is unordered. To achieve correct global ordering, a merge‑sort must be performed after fetching enough rows from each shard.
shard node 1: {1,3,5,7,9}
shard node 2: {2,4,6,8,10}Assuming a page size of 4 and current page 1, the naive approach of reading the first two rows from each shard works only for perfectly balanced data. In realistic scenarios, the worst case may require reading pageSize * totalShards rows from each shard to guarantee correct ordering, leading to high memory usage and performance bottlenecks for deep pagination.
2. Deep‑Paging Performance Issues
To mitigate the cost of reading large numbers of rows, the article suggests two query‑adjustment strategies:
Explicit filtering : Narrow the result set by adding conditions such as time range, payment status, or delivery status.
Implicit filtering : Change the ordering field (e.g., shift the timestamp boundary) when paging reaches a limit.
Example data from two shards demonstrates how ordering by createDateTime interleaves rows from both shards. By moving the where createDateTime > '2018-01-11 10:10:13' condition forward, subsequent pages can be fetched without re‑reading previously seen rows.
3. Sharding Strategy
The implementation uses a combination of mod and preSharding. This approach simplifies routing but creates data‑migration challenges when nodes change. The solution adopts consistent hashing with virtual nodes to smooth migrations.
physics node : node 1 node 2 node 3 node 4
virtual node : node 1 node 2 ... node 20
node mapping :
virtual node 1~5 -> physics node 1
virtual node 6~10 -> physics node 2
virtual node 11~15 -> physics node 3
virtual node 16~20 -> physics node 4Hash values are stored in the table to avoid costly re‑hashing during migrations.
4. Practical Considerations
Sharding‑JDBC does not support batch inserts; workarounds include pre‑computing physical table names or using hash‑based routing.
Integration with Druid + MyBatis may require adjustments because Sharding‑JDBC wraps the data source.
Spring Boot projects need to configure IncrementIdGenerator and avoid class‑loader conflicts (e.g., remove spring-boot-devtools).
MyBatis XML generators must output logical table names instead of physical ones.
Sharding‑JDBC’s default Snowflake ID generator must be customized with appropriate datacenterId and workerId values, often coordinated via Zookeeper.
Read‑write splitting can be handled by MySQL’s ReplicationDriver or by setting datasource hints manually.
When logical tables are not sharded, custom ShardingStrategy implementations are required.
Global ID generation options include Zookeeper‑based distributed IDs, centralized ID services with pre‑generated tables, or business‑rule‑driven IDs.
After sharding, auto‑increment IDs no longer reflect insertion order; time‑based fields should be used for sorting.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
