Databases 7 min read

Deep Pagination in MySQL and Distributed Databases: Problems and VtDriver Optimizations

The article explains how large‑offset (deep) pagination in MySQL and distributed databases degrades performance and increases memory and bandwidth usage, and describes VtDriver's streaming and SQL‑rewrite techniques along with practical recommendations such as range queries and sub‑queries to mitigate these issues.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Deep Pagination in MySQL and Distributed Databases: Problems and VtDriver Optimizations

In front‑end pages, pagination is used to avoid loading the entire dataset at once; MySQL pagination provides this functionality.

1. MySQL Deep Pagination

When the offset in a LIMIT clause is very large, MySQL must skip many rows before returning the requested ones, causing severe performance loss. Example:

SELECT * FROM t_order ORDER BY id LIMIT 1000000, 10

This query cannot use an index efficiently and is referred to as deep pagination.

2. Deep Pagination in Distributed Databases

In a sharded environment (e.g., an elastic database), the table t_order is partitioned by t_col1. To fetch the 2 rows after the second record, a naïve approach would run: SELECT * FROM t_order ORDER BY id LIMIT 2, 2 Executing this on each shard and merging results yields incorrect ordering. The correct method is to request enough rows from each shard (offset + limit), e.g.: SELECT * FROM t_order ORDER BY id LIMIT 0, 4 After gathering the results, an in‑memory sort and slice produce the desired rows.

Deep pagination in a distributed setting dramatically increases data transferred; for example, retrieving 10 rows after offset 1,000,000 would be rewritten to:

SELECT * FROM t_order ORDER BY id LIMIT 0, 1000010

This sends over a million rows per shard, raising OOM risk.

3. VtDriver Deep Pagination Optimizations

VtDriver pushes down queries that target a single shard without rewriting them, saving resources. For deep pagination, it can automatically switch to a streaming query when the offset exceeds a configurable deepPaginationThreshold. Streaming combined with merge‑sort limits memory usage to the current cursor rows from each shard.

4. Optimization Recommendations

Prefer range queries when IDs are continuous, e.g.:

SELECT * FROM t_order WHERE id > 100000 AND id <= 100010 ORDER BY id

Or use the last retrieved ID as a cursor:

SELECT * FROM t_order WHERE id > 100000 LIMIT 10

Sub‑queries can also move the filter to the primary key, reducing transferred data:

SELECT * FROM t_order WHERE id >= (SELECT id FROM t_order LIMIT 1000000, 1) LIMIT 10;

However, when the data volume is extremely large, client‑side OOM risk remains, so these techniques should be applied judiciously.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationSQLmysqlpaginationVtDriver
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.