Databases 15 min read

Mastering Database Sharding: Strategies, Pitfalls, and Practical Tips

This article reviews the background, side effects, sharding strategies, and key considerations of database partitioning, illustrating challenges such as merge‑sort pagination, deep‑paging performance, hash‑based routing, and integration issues with Sharding‑JDBC, while offering practical solutions and best‑practice recommendations.

dbaplus Community

Aug 4, 2018

Mastering Database Sharding: Strategies, Pitfalls, and Practical Tips

Background

The author describes a recent project that split two massive tables—orders and coupons—across the entire group of subsidiaries. Rapid business growth pushed storage, write, and read demands to critical levels, prompting a horizontal logical‑DB sharding approach to support three to five years of growth.

Side Effects of Sharding

Sharding introduces architectural complexity, especially when selecting an effective sharding key. Good keys (e.g., user_id, order_type, merchant_id, channel_id, batch_id) align with business boundaries and improve performance; poor keys require extensive scaffolding and increase system complexity. Additional challenges include sorting and pagination across shards.

1. Merge‑Sort Pagination

When data is distributed across multiple shards, each shard returns ordered subsets, but the combined result set is unordered. To achieve correct global ordering, a merge‑sort must be performed after fetching enough rows from each shard.

shard node 1: {1,3,5,7,9}
shard node 2: {2,4,6,8,10}

Assuming a page size of 4 and current page 1, the naive approach of reading the first two rows from each shard works only for perfectly balanced data. In realistic scenarios, the worst case may require reading pageSize * totalShards rows from each shard to guarantee correct ordering, leading to high memory usage and performance bottlenecks for deep pagination.

2. Deep‑Paging Performance Issues

To mitigate the cost of reading large numbers of rows, the article suggests two query‑adjustment strategies:

Explicit filtering : Narrow the result set by adding conditions such as time range, payment status, or delivery status.

Implicit filtering : Change the ordering field (e.g., shift the timestamp boundary) when paging reaches a limit.

Example data from two shards demonstrates how ordering by createDateTime interleaves rows from both shards. By moving the where createDateTime > '2018-01-11 10:10:13' condition forward, subsequent pages can be fetched without re‑reading previously seen rows.

3. Sharding Strategy

The implementation uses a combination of mod and preSharding. This approach simplifies routing but creates data‑migration challenges when nodes change. The solution adopts consistent hashing with virtual nodes to smooth migrations.

physics node : node 1 node 2 node 3 node 4
virtual node : node 1 node 2 ... node 20

node mapping :
virtual node 1~5   -> physics node 1
virtual node 6~10  -> physics node 2
virtual node 11~15 -> physics node 3
virtual node 16~20 -> physics node 4

Hash values are stored in the table to avoid costly re‑hashing during migrations.

4. Practical Considerations

Sharding‑JDBC does not support batch inserts; workarounds include pre‑computing physical table names or using hash‑based routing.

Integration with Druid + MyBatis may require adjustments because Sharding‑JDBC wraps the data source.

Spring Boot projects need to configure IncrementIdGenerator and avoid class‑loader conflicts (e.g., remove spring-boot-devtools).

MyBatis XML generators must output logical table names instead of physical ones.

Sharding‑JDBC’s default Snowflake ID generator must be customized with appropriate datacenterId and workerId values, often coordinated via Zookeeper.

Read‑write splitting can be handled by MySQL’s ReplicationDriver or by setting datasource hints manually.

When logical tables are not sharded, custom ShardingStrategy implementations are required.

Global ID generation options include Zookeeper‑based distributed IDs, centralized ID services with pre‑generated tables, or business‑rule‑driven IDs.

After sharding, auto‑increment IDs no longer reflect insertion order; time‑based fields should be used for sorting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Database Sharding Sharding-JDBC Deep Paging Hash Routing Merge Sort Pagination

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.