Databases 16 min read

How to Optimize Pagination for Massive MySQL Tables with Sharding

When a single MySQL table grows to millions of rows, offset‑based pagination becomes painfully slow, so this article explains why, measures the performance impact, and presents three practical sharding‑aware pagination strategies—including global query, jump‑page prohibition, and a two‑step query method—along with concrete SQL rewrites and optimization tips.

dbaplus Community
dbaplus Community
dbaplus Community
How to Optimize Pagination for Massive MySQL Tables with Sharding

Impact of Large Offsets on MySQL Pagination

When a table grows to millions of rows, the LIMIT offset, count clause forces MySQL to scan and discard offset rows before returning the result set. The cost grows linearly with the offset, turning a millisecond query into a second‑scale query.

Example measurements on a table with >1 M rows (returning 100 rows per query):

Offset 0 → 18 ms

Offset 1,000 → 23 ms

Offset 10,000 → 54 ms

Offset 100,000 → 268 ms

Offset 500,000 → 1.16 s

Offset 1,000,000 → 2.35 s

Sharding Architecture

In a large e‑commerce system the order service was sharded by user ID ( uid) to satisfy the C‑end requirement of querying a user’s own orders. The logical order table is split into two physical tables:

t_order_1 = hash(uid % 2 + 1) = 1
t_order_2 = hash(uid % 2 + 1) = 2
Sharding diagram
Sharding diagram

Pagination on a Single Table

Standard pagination (page size 5, second page) uses:

SELECT * FROM t_order ORDER BY time ASC LIMIT 5,5;

After sharding the same logical request must be executed on each shard and the partial results merged.

Global Query Method

Run the original query on every shard:

SELECT * FROM t_order_1 ORDER BY time ASC LIMIT 5,5;
SELECT * FROM t_order_2 ORDER BY time ASC LIMIT 5,5;

Merge the two result sets in memory and then take the desired page. Drawbacks:

Data volume returned grows with the page number (large offsets cause huge transfers).

The service layer must re‑sort the combined data, increasing CPU and memory usage.

Sharding‑JDBC Pagination Correction

When the offset is large, rewriting each shard with the same LIMIT offset, count is incorrect because each shard returns only its own page. The correct rewrite expands the offset to cover all preceding pages:

SELECT * FROM t_order_1 ORDER BY time ASC LIMIT 0,10;
SELECT * FROM t_order_2 ORDER BY time ASC LIMIT 0,10;

After fetching the first two pages from each shard, merge and extract the global second page.

Performance Bottleneck of Large Offsets

Example of a costly query on a single table:

SELECT * FROM t_order ORDER BY id LIMIT 100000,10;

MySQL must skip 100 000 rows. In a two‑shard environment the query is rewritten to fetch the first 100 000 rows from each shard and then keep the last 10 after sorting, dramatically increasing data transferred.

Pagination Optimizations

Keyset pagination (ID range) avoids offset scans when IDs are monotonic:

SELECT * FROM t_order WHERE id > 10000 AND id < 100010 ORDER BY id;

Or use the last seen ID for the next page:

SELECT * FROM t_order WHERE id > 10000 LIMIT 10;

Prohibit Jump‑Page Queries

Expose only a “next‑page” API (e.g., pull‑to‑refresh). Each request returns a single page, keeping data volume constant regardless of how deep the user navigates. The trade‑off is the inability to jump directly to an arbitrary page.

Two‑Step Query Method (Precise, Low‑Transfer)

First query with adjusted offset per shard . For a global offset of 5 across two shards, each shard uses LIMIT 2,5 (global offset ÷ shard count):

SELECT * FROM t_order_1 ORDER BY time ASC LIMIT 2,5;
SELECT * FROM t_order_2 ORDER BY time ASC LIMIT 2,5;

Determine the minimum timestamp among the two result sets (e.g., 1664088392) and call it time_min.

Second query using a range that spans from time_min to the maximum timestamp observed in each shard:

SELECT * FROM t_order_1 WHERE time BETWEEN $time_min AND 1664088581 ORDER BY time ASC;
SELECT * FROM t_order_2 WHERE time BETWEEN $time_min AND 1664088481 ORDER BY time ASC;

Merge the two result sets, locate the global offset of time_min (here 5), and then extract the required page.

This approach guarantees that each request transfers only a tiny amount of data, independent of the page number, at the cost of issuing two queries per page.

Comparison of Three Strategies

Global Query Method : Simple implementation but performance degrades sharply as the page number grows because each shard returns increasingly large result sets.

Prohibit Jump‑Page Method : Business‑level change that returns only one page per request; high performance but cannot jump to distant pages.

Two‑Step Query Method : Precise, low data transfer, suitable when data is evenly distributed; requires two queries per page.

Choosing the appropriate method depends on workload characteristics, data size, and acceptable complexity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceSQLmysqlpagination
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.