How to Overcome Deep Pagination Bottlenecks with Cursor, Subqueries, and Elasticsearch Search After
This article explains why deep pagination hurts performance in large datasets, demonstrates the inefficiency of offset‑based SQL queries, and presents practical alternatives such as cursor pagination, sub‑query optimizations, delayed joins, caching, business limits, and the Elasticsearch Search After technique for stable, scalable paging.
Introduction
Pagination is a common requirement in business applications, but when the data volume grows, traditional offset‑based pagination becomes a hidden performance killer known as the deep pagination problem.
What Is Deep Pagination?
When a table contains millions of rows and a user requests a page far from the start (e.g., page 9999 with 10 rows per page), the typical SQL query uses LIMIT offset, size to skip the preceding rows. The database must still scan all rows up to the offset, causing the query time to increase dramatically and potentially overwhelming the database.
SELECT * FROM table ORDER BY id LIMIT 99990, 10;The core issue is that larger offsets lead to lower query efficiency .
Solutions to Deep Pagination
1. Cursor‑Based Pagination
Uses a unique column (e.g., an auto‑increment ID or timestamp) as a cursor. The next query fetches rows where the cursor is greater than the last value from the previous page.
SELECT * FROM table WHERE id > last_id ORDER BY id LIMIT 10;Advantages: stable performance. Limitation: cannot jump directly to an arbitrary page.
2. Sub‑Query Optimization
First locate the start position with a sub‑query, then fetch the required rows.
SELECT * FROM table WHERE id >= (SELECT id FROM table ORDER BY id LIMIT 99990, 1) LIMIT 10;A variant that does not rely on ordered IDs:
SELECT t1.* FROM account t1, (SELECT id FROM account WHERE update_time >= '2020-09-19' LIMIT 100000, 10) t2 WHERE t1.id = t2.id;This reduces the scanned range but still cannot fully eliminate the deep‑pagination cost.
3. Delayed Join (Inner Join Instead of Sub‑Query)
Similar to the sub‑query method but replaces the sub‑query with an INNER JOIN to avoid extra table scans.
SELECT t.* FROM table AS t INNER JOIN (SELECT id FROM table ORDER BY id LIMIT 99990, 10) AS sub ON t.id = sub.id;4. Cache Paginated Data
For relatively static data, cache the first few pages (e.g., in Redis) to reduce database load. Suitable only when data updates are infrequent.
5. Business‑Level Limits
Restrict the maximum range a user can query (e.g., limit order history to the last three months) and disallow full‑table scans, as many large‑scale sites do.
Why Elasticsearch Can Solve Deep Pagination
Elasticsearch is a distributed search engine designed for massive datasets. Its Search After feature provides a stateless, cursor‑like pagination that avoids offset calculations entirely.
Search After Basics
Each document must contain a globally unique field (commonly _id) used for sorting. The query returns the sort values of the last hit; the next request supplies those values in the search_after parameter.
{
"size": 10,
"sort": [
{"order_date": "asc"},
{"_id": "asc"}
]
}Response includes order_date and _id for each hit.
{
"size": 10,
"sort": [
{"order_date": "asc"},
{"_id": "asc"}
],
"search_after": [last_order_date, last_id]
}This method returns the next page without scanning previous pages.
Advantages of Search After
Performance remains stable regardless of page number.
Handles billions of records thanks to Elasticsearch’s distributed architecture.
Supports multi‑field sorting, fitting complex business scenarios.
Limitations of Search After
Cannot jump to an arbitrary page; navigation is sequential.
Requires a unique sort field; otherwise pagination may produce inconsistent results.
Summary
Deep pagination is a frequent performance bottleneck in large‑scale data retrieval. Offset‑based LIMIT queries degrade quickly, while alternatives such as cursor pagination, sub‑query optimizations, delayed joins, caching, and business limits can mitigate the issue. Elasticsearch’s Search After offers a particularly elegant solution by providing stable, scalable pagination without the overhead of offset scans.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
