Deep Pagination in MySQL and Distributed Databases: Problems and VtDriver Optimizations
The article explains how large‑offset (deep) pagination in MySQL and distributed databases degrades performance and increases memory and bandwidth usage, and describes VtDriver's streaming and SQL‑rewrite techniques along with practical recommendations such as range queries and sub‑queries to mitigate these issues.
In front‑end pages, pagination is used to avoid loading the entire dataset at once; MySQL pagination provides this functionality.
1. MySQL Deep Pagination
When the offset in a LIMIT clause is very large, MySQL must skip many rows before returning the requested ones, causing severe performance loss. Example:
SELECT * FROM t_order ORDER BY id LIMIT 1000000, 10This query cannot use an index efficiently and is referred to as deep pagination.
2. Deep Pagination in Distributed Databases
In a sharded environment (e.g., an elastic database), the table t_order is partitioned by t_col1. To fetch the 2 rows after the second record, a naïve approach would run: SELECT * FROM t_order ORDER BY id LIMIT 2, 2 Executing this on each shard and merging results yields incorrect ordering. The correct method is to request enough rows from each shard (offset + limit), e.g.: SELECT * FROM t_order ORDER BY id LIMIT 0, 4 After gathering the results, an in‑memory sort and slice produce the desired rows.
Deep pagination in a distributed setting dramatically increases data transferred; for example, retrieving 10 rows after offset 1,000,000 would be rewritten to:
SELECT * FROM t_order ORDER BY id LIMIT 0, 1000010This sends over a million rows per shard, raising OOM risk.
3. VtDriver Deep Pagination Optimizations
VtDriver pushes down queries that target a single shard without rewriting them, saving resources. For deep pagination, it can automatically switch to a streaming query when the offset exceeds a configurable deepPaginationThreshold. Streaming combined with merge‑sort limits memory usage to the current cursor rows from each shard.
4. Optimization Recommendations
Prefer range queries when IDs are continuous, e.g.:
SELECT * FROM t_order WHERE id > 100000 AND id <= 100010 ORDER BY idOr use the last retrieved ID as a cursor:
SELECT * FROM t_order WHERE id > 100000 LIMIT 10Sub‑queries can also move the filter to the primary key, reducing transferred data:
SELECT * FROM t_order WHERE id >= (SELECT id FROM t_order LIMIT 1000000, 1) LIMIT 10;However, when the data volume is extremely large, client‑side OOM risk remains, so these techniques should be applied judiciously.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
