Cross-Database/Table Pagination Query Solutions for Sharded Systems
This article examines the challenges of pagination after database sharding and presents three practical solutions—global query, jump‑page prohibition, and two‑step query—detailing their implementations, trade‑offs, and performance considerations for large‑scale order services.
Hello everyone, I am Chen.
As data volume grows, sharding (splitting databases and tables) becomes necessary to improve read/write performance, but it introduces many challenges; this article discusses solutions for cross‑database/table pagination queries.
For other issues after sharding, see the author's previous article on sharding.
Architecture Background
The author previously worked on a large e‑commerce order service. In the early stage a single database/table sufficed, but as data grew the performance degraded. The team tried various optimizations such as cold‑hot separation and query separation, which helped only marginally under daily million‑row growth.
Business requirements before sharding:
C‑end users need to query all their own orders.
Backend admins and customer service need to query orders by order number, user info, etc.
B‑end merchants need to query orders belonging to their shops.
Prioritizing C‑end users, the sharding key uid was chosen.
Using uid as the sharding key satisfies C‑end queries; for admin and B‑end queries the data is duplicated to ES or HBase for easy access.
Assume the order table is split by hash(uid%2+1) into two tables, as shown in the diagram.
To paginate by order time (full‑table scan), the single‑table SQL is:
select * from t_order order by time asc limit 5,5;The offset is 5 (skip the first 5 rows).
Data in t_order_1 and t_order_2 is illustrated by the following images.
Below are three pagination strategies for the sharded architecture.
1. Global Query Method
Execute the original pagination SQL on each shard:
select * from t_order_1 order by time asc limit 5,5;
select * from t_order_2 order by time asc limit 5,5;Merge the results in memory and sort again, but the naive merge yields incorrect results. The correct approach is to fetch the first two pages from each shard (e.g., limit 0,10) and then merge and pick the desired page.
Drawbacks:
Data returned grows with the page number, leading to low performance.
The service layer must perform a second sort, increasing CPU and memory usage.
Sharding‑JDBC provides an optimization using streaming processing plus merge sort to reduce memory consumption.
2. Prohibit Jump‑Page Method
When data volume is large, only allow “next‑page” queries. After retrieving the first page, use its maximum time value as the lower bound for the next query, e.g.:
select * from t_order_1 where time > 1664088392 order by time asc limit 5;
select * from t_order_2 where time > 1664088392 order by time asc limit 5;This returns only one page of data regardless of the total page count, greatly improving performance. The downside is that users cannot jump directly to an arbitrary page.
3. Two‑Step Query Method
This method balances performance and functionality.
Step 1 – SQL Rewrite
Rewrite the original pagination SQL to fetch a portion of each shard based on a halved global offset (e.g., offset = global_offset / 2 = 2):
select * from t_order_1 order by time asc limit 2,5;
select * from t_order_2 order by time asc limit 2,5;Collect the results and find the minimum time value among the two shards (called time_min).
Step 2 – Find Minimum Time
Assume time_min = 1664088392 from t_order_2.
Step 3 – Second Query Using BETWEEN
Query each shard for rows between time_min and the maximum time observed in the first step:
select * from t_order_1 where time between $time_min and 1664088581 order by time asc;
select * from t_order_2 where time between $time_min and 1664088481 order by time asc;Merge the two result sets, sort them, and locate the global offset of time_min. In the example the global offset is 5, so the original pagination query limit 5,5 will return the rows starting from offset 6.
Advantages: precise data retrieval with very small per‑request payload, performance does not degrade with page number.
Disadvantages: requires two round‑trips to the database.
Summary
The article presented three pagination solutions after sharding:
Global Query Method – simplest but performance drops as page number increases.
Prohibit Jump‑Page Method – higher performance by returning only one page at a time, but cannot jump pages.
Two‑Step Query Method – precise and efficient for balanced data distribution, at the cost of an extra query.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
