How to Optimize Pagination for Massive MySQL Tables with Sharding
When a single MySQL table grows to millions of rows, offset‑based pagination becomes painfully slow, so this article explains why, measures the performance impact, and presents three practical sharding‑aware pagination strategies—including global query, jump‑page prohibition, and a two‑step query method—along with concrete SQL rewrites and optimization tips.
Impact of Large Offsets on MySQL Pagination
When a table grows to millions of rows, the LIMIT offset, count clause forces MySQL to scan and discard offset rows before returning the result set. The cost grows linearly with the offset, turning a millisecond query into a second‑scale query.
Example measurements on a table with >1 M rows (returning 100 rows per query):
Offset 0 → 18 ms
Offset 1,000 → 23 ms
Offset 10,000 → 54 ms
Offset 100,000 → 268 ms
Offset 500,000 → 1.16 s
Offset 1,000,000 → 2.35 s
Sharding Architecture
In a large e‑commerce system the order service was sharded by user ID ( uid) to satisfy the C‑end requirement of querying a user’s own orders. The logical order table is split into two physical tables:
t_order_1 = hash(uid % 2 + 1) = 1
t_order_2 = hash(uid % 2 + 1) = 2Pagination on a Single Table
Standard pagination (page size 5, second page) uses:
SELECT * FROM t_order ORDER BY time ASC LIMIT 5,5;After sharding the same logical request must be executed on each shard and the partial results merged.
Global Query Method
Run the original query on every shard:
SELECT * FROM t_order_1 ORDER BY time ASC LIMIT 5,5;
SELECT * FROM t_order_2 ORDER BY time ASC LIMIT 5,5;Merge the two result sets in memory and then take the desired page. Drawbacks:
Data volume returned grows with the page number (large offsets cause huge transfers).
The service layer must re‑sort the combined data, increasing CPU and memory usage.
Sharding‑JDBC Pagination Correction
When the offset is large, rewriting each shard with the same LIMIT offset, count is incorrect because each shard returns only its own page. The correct rewrite expands the offset to cover all preceding pages:
SELECT * FROM t_order_1 ORDER BY time ASC LIMIT 0,10;
SELECT * FROM t_order_2 ORDER BY time ASC LIMIT 0,10;After fetching the first two pages from each shard, merge and extract the global second page.
Performance Bottleneck of Large Offsets
Example of a costly query on a single table:
SELECT * FROM t_order ORDER BY id LIMIT 100000,10;MySQL must skip 100 000 rows. In a two‑shard environment the query is rewritten to fetch the first 100 000 rows from each shard and then keep the last 10 after sorting, dramatically increasing data transferred.
Pagination Optimizations
Keyset pagination (ID range) avoids offset scans when IDs are monotonic:
SELECT * FROM t_order WHERE id > 10000 AND id < 100010 ORDER BY id;Or use the last seen ID for the next page:
SELECT * FROM t_order WHERE id > 10000 LIMIT 10;Prohibit Jump‑Page Queries
Expose only a “next‑page” API (e.g., pull‑to‑refresh). Each request returns a single page, keeping data volume constant regardless of how deep the user navigates. The trade‑off is the inability to jump directly to an arbitrary page.
Two‑Step Query Method (Precise, Low‑Transfer)
First query with adjusted offset per shard . For a global offset of 5 across two shards, each shard uses LIMIT 2,5 (global offset ÷ shard count):
SELECT * FROM t_order_1 ORDER BY time ASC LIMIT 2,5;
SELECT * FROM t_order_2 ORDER BY time ASC LIMIT 2,5;Determine the minimum timestamp among the two result sets (e.g., 1664088392) and call it time_min.
Second query using a range that spans from time_min to the maximum timestamp observed in each shard:
SELECT * FROM t_order_1 WHERE time BETWEEN $time_min AND 1664088581 ORDER BY time ASC;
SELECT * FROM t_order_2 WHERE time BETWEEN $time_min AND 1664088481 ORDER BY time ASC;Merge the two result sets, locate the global offset of time_min (here 5), and then extract the required page.
This approach guarantees that each request transfers only a tiny amount of data, independent of the page number, at the cost of issuing two queries per page.
Comparison of Three Strategies
Global Query Method : Simple implementation but performance degrades sharply as the page number grows because each shard returns increasingly large result sets.
Prohibit Jump‑Page Method : Business‑level change that returns only one page per request; high performance but cannot jump to distant pages.
Two‑Step Query Method : Precise, low data transfer, suitable when data is evenly distributed; requires two queries per page.
Choosing the appropriate method depends on workload characteristics, data size, and acceptable complexity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
