Databases 15 min read

Database Sharding: Partitioning Strategies, Consistent Hashing, SQL Rewrite, and Merge Engine

Database sharding improves performance by partitioning tables vertically or horizontally and routing rows with consistent hashing, while ShardingJDBC rewrites SQL for physical tables, adds missing columns, corrects pagination, splits batch inserts, and merges results using sorting, grouping, aggregation, and pagination engines.

HelloTech
HelloTech
HelloTech
Database Sharding: Partitioning Strategies, Consistent Hashing, SQL Rewrite, and Merge Engine

As a company's business expands rapidly, the amount of data stored in relational databases grows dramatically, leading to slower access performance. A single relational database can become a bottleneck due to limited storage capacity, connection limits, and processing power. When a table reaches around 10 million rows or 100 GB, performance degrades even after adding read replicas or optimizing indexes.

Two common solutions are considered:

1. Hardware upgrade : increasing storage, CPU, etc. This approach is costly and offers limited improvement because the bottleneck remains at the database service layer.

2. Sharding (database and table partitioning) : reducing the amount of data per database or per table, thereby improving overall database performance.

Partitioning methods

Vertical partitioning : split a table by columns into multiple tables, each storing a subset of fields. This reduces I/O contention and the probability of table locks, allowing independent operations such as product browsing and configuration management. After vertical partitioning, horizontal partitioning can be applied to the resulting tables.

Horizontal partitioning : split a table's rows into multiple tables based on business rules, without changing the original table schema.

Consistent hash algorithm

Traditional hash‑modulo suffers from severe data movement when nodes are added or removed. Consistent hashing uses a ring ranging from 0 to 2^32‑1, with the start and end connected, forming a circular integer space. Each physical table (e.g., t_0, t_1, t_2, t_3) is placed on the ring. When a new record arrives, its sharding key is hashed to a point on the ring, and the nearest clockwise table stores the record.

When a new node (e.g., table t5) is inserted between existing nodes, only the data between the neighboring nodes needs to be moved, dramatically reducing migration volume compared with ordinary hash.

Data migration approaches

1. Stop‑the‑world migration

2. Dual‑write (write to both old and new nodes during transition)

Routing engine in ShardingJDBC (illustrated by diagram in source).

Rewrite engine

In sharding scenarios, logical table names in SQL must be rewritten to the actual physical table names after routing. Identifiers that need rewriting include table names, index names, and schema names.

Example:

SELECT order_id FROM t_order WHERE order_id=1;

If the sharding key order_id=1 routes to shard t_order_1 , the rewritten SQL becomes:

SELECT order_id FROM t_order_1 WHERE order_id=1;

Column supplement

Needed when the original SELECT does not contain columns required for result merging (e.g., GROUP BY, ORDER BY) or when aggregating with AVG . Example for ORDER BY:

SELECT order_id FROM t_order ORDER BY user_id;

Rewritten to include the ordering column:

SELECT order_id, user_id AS ORDER_BY_DERIVED_0 FROM t_order ORDER BY user_id;

For AVG , rewrite to COUNT and SUM :

SELECT AVG(price) FROM t_order WHERE user_id=1;

becomes

SELECT COUNT(price) AS AVG_DERIVED_COUNT_0, SUM(price) AS AVG_DERIVED_SUM_0 FROM t_order WHERE user_id=1;

Pagination correction

In a sharded environment, a simple LIMIT 10,10 on each shard does not produce correct global pagination. The query must be rewritten to fetch enough rows from each shard (e.g., LIMIT 0,3 for page 2 of size 2) and then merge results.

Example of incorrect pagination:

SELECT score FROM t_score ORDER BY score DESC LIMIT 1,2;

Correct approach: rewrite to LIMIT 0,3 , retrieve the first three rows from each shard, then merge and select the desired page.

Batch split for INSERT

When a batch INSERT spans multiple shards, the SQL must be rewritten so that each shard receives only the rows belonging to it. Example original statement:

INSERT INTO t_order (order_id, xxx) VALUES (1, 'xxx'), (2, 'xxx'), (3, 'xxx');

If shards are split by odd/even order_id , the rewritten statements become:

INSERT INTO t_order_0 (order_id, xxx) VALUES (2, 'xxx');

INSERT INTO t_order_1 (order_id, xxx) VALUES (1, 'xxx'), (3, 'xxx');

Similar rewriting applies to IN queries to improve performance.

Merge engine

The merge engine handles result merging from multiple shards. Types include:

Sorting merge : each shard returns an ordered result set; a priority queue merges them efficiently (similar to merge sort).

Grouping merge : can be streaming (when ORDER BY matches GROUP BY) or in‑memory (when they differ). Streaming merge processes groups on the fly; in‑memory merge buffers groups before emitting results.

Aggregation merge : MAX/MIN: compare values across shards and return the extreme. SUM/COUNT: sum values from all shards. AVG: compute SUM and COUNT per shard, then calculate the final average.

Pagination merge : applied as a decorator over other merge types; it filters out unnecessary rows after merging. ShardingSphere rewrites large LIMIT offsets (e.g., LIMIT 10000000,10 ) to LIMIT 0,10000010 to ensure correctness without loading all rows into memory.

For efficient pagination, it is recommended to use ID‑based range queries (e.g., SELECT * FROM t_order WHERE id > 100000 AND id <= 100010 ORDER BY id; ) or use the last retrieved ID as the next page's starting point.

database shardingconsistent hashinghorizontal partitioningMERGE EngineSQL Rewritevertical partitioning
HelloTech
Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.