Databases 5 min read

How to Efficiently Paginate Across Sharded Databases Without Heavy Overhead

This article explains a practical method for performing pagination on a sharded table by querying each database up to the target offset, merging and sorting the results, and extracting the required page while discussing performance drawbacks and an optimization when data distribution is uniform.

Java High-Performance Architecture

Jan 16, 2016

How to Efficiently Paginate Across Sharded Databases Without Heavy Overhead

When a table grows too large, it is often split across multiple databases (sharding). For example, the table tb1 can be divided into three shards: shard1, shard2, and shard3.

To execute pagination across these shards, a common strategy is to query each shard for rows from the beginning up to the last row needed for the final result set, merge the results, sort them, and then pick the rows that belong to the requested page.

Suppose we want to retrieve the 5th and 6th rows ordered by c1. The original SQL is SELECT c1 FROM tb1 ORDER BY c1 LIMIT 4, 2. This query is run on each shard, and the intermediate results are combined as illustrated below.

Assuming the shards return the following partial results:

The system then scans the merged list in order, discarding rows until the offset is reached. The process for each row is shown in the subsequent diagrams.

Rows 0‑3 are skipped because they are before the 4‑th offset. When the 4‑th row (value 4) is encountered, it is added to the result set, followed by the 5‑th row (value 5). At this point the desired two rows have been collected, and the search stops.

This approach is straightforward but has a major drawback: when the starting offset is very large, each shard must scan a huge number of rows, leading to unacceptable performance.

For example, the query SELECT c1 FROM tb1 ORDER BY c1 LIMIT 100000000, 2 forces every shard to read 100,000,002 rows before merging and sorting.

If the data distribution across shards is roughly even, the workload can be reduced by dividing the offset among the shards. With three shards, the offset 9,999,999 can be split into roughly 3,333,333 per shard.

Each shard runs a reduced query:

The partial results yield a minimum and maximum value (e.g., 4 and 18). Using these bounds, a second round of queries narrows the search to the relevant range across all shards.

From the combined result we locate the offset of the first row that belongs to the target page (e.g., offset 3,333,331 corresponds to the overall 9,999,996‑th record). Starting from that point, we skip the first three rows and retrieve the next four rows to complete the pagination.

While this method reduces the amount of data each shard must process, it still requires additional queries and assumes roughly uniform data distribution.

In summary, paginating across sharded databases can be implemented by merging per‑shard results, but large offsets cause heavy scanning; splitting the offset proportionally among shards can mitigate the cost when data is evenly distributed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance SQL Sharding pagination distributed databases

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.