How to Efficiently Paginate Across Sharded Tables: Three Proven Methods
This article explains why sharding tables improves read/write speed, then compares three pagination strategies—global query, no‑skip pagination, and two‑phase query—detailing their SQL rewrites, execution steps, performance trade‑offs, and practical examples.
Architecture Background
The author describes an e‑commerce order service that started with a single database and table but, as data grew to millions of rows daily, performance degraded, prompting a move to database and table sharding. The chosen sharding key is uid to satisfy C‑end user queries, while backend and B‑end queries are served via duplicated data in ES or HBase.
Orders are split into two tables using hash(uid%2+1), resulting in t_order_1 and t_order_2. The following diagram shows the split:
Single‑Table Pagination Example
When ordering by time ascending, a typical MySQL pagination query for the second page (5 rows per page) looks like:
select * from t_order order by time asc limit 5,5;Here offset=5 retrieves rows 6‑10.
1. Global Query Method
Execute the same pagination query on each shard:
select * from t_order_1 order by time asc limit 5,5;
select * from t_order_2 order by time asc limit 5,5;Merge the two result sets in memory, sort them globally, and then take the desired page. This approach is simple but suffers from two major drawbacks:
Data volume returned grows with the page number, leading to poor performance.
The service layer must perform an additional sort, increasing CPU and memory usage.
Optimizations such as Sharding‑JDBC’s streaming + merge‑sort can mitigate memory pressure.
2. No‑Skip Pagination Method
Instead of allowing arbitrary page jumps, only the next page is fetched. After retrieving the first page, the maximum time value (e.g., 1664088392) becomes the lower bound for the next query:
select * from t_order_1 where time > 1664088392 order by time asc limit 5;
select * from t_order_2 where time > 1664088392 order by time asc limit 5;The two result sets are merged and sorted, yielding the second page. This reduces data transferred dramatically, but it cannot jump directly to later pages because the required cursor value is unknown.
3. Two‑Phase Query Method
Step 1 – Rewrite Original SQL
Adjust the offset for each shard (global offset ÷ number of shards). For two shards, offset=5/2=2:
select * from t_order_1 order by time asc limit 2,5;
select * from t_order_2 order by time asc limit 2,5;Execute both queries and obtain the first five rows from each shard.
Step 2 – Determine Minimum Time (time_min)
Identify the smallest time value among the two result sets. In the example, time_min = 1664088392 comes from t_order_2.
Step 3 – Second‑Phase Query Using BETWEEN
Query each shard for rows between time_min and the maximum time observed in the first phase:
select * from t_order_1 where time between $time_min and 1664088581 order by time asc;
select * from t_order_2 where time between $time_min and 1664088481 order by time asc;The results are merged, sorted, and the global offset is recomputed (e.g., offset=5), yielding the exact page that the original single‑table query would have returned.
Advantages & Drawbacks
Advantages: returns precisely the required rows with minimal data transfer; performance remains stable across pages. Drawbacks: requires two round‑trips to the database.
Conclusion
The article compares three pagination strategies for sharded tables:
Global query – simple but degrades with higher page numbers.
No‑skip pagination – high performance but disallows arbitrary page jumps.
Two‑phase query – accurate and efficient for balanced data distributions, at the cost of an extra query.
Choosing the right method depends on the specific performance requirements and business constraints.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
