Elasticsearch Pagination: From/Size, Deep Paging Issues, Scroll, Search After, PIT and Best Practices
This article explains Elasticsearch pagination mechanisms—including from/size, deep paging drawbacks, scroll, scroll‑scan, sliced scroll, search_after and point‑in‑time—detailing their inner workings, performance trade‑offs, configuration limits, and practical recommendations for handling large result sets.
Introduction
Elasticsearch is a real‑time distributed search and analytics engine. Similar to relational databases, deep pagination should be avoided, and this article focuses on pagination techniques in Elasticsearch.
From/Size Parameters
By default a search returns the top 10 hits. Pagination can be performed using the from and size parameters.
from defines the number of hits to skip (default 0).
size defines the maximum number of hits to return.
Example query:
POST /my_index/my_type/_search
{
"query": { "match_all": {} },
"from": 100,
"size": 10
}This request returns 10 documents starting from the 101st hit.
Query and Fetch Phases
Elasticsearch executes a search in two stages:
Query phase – determines which documents match.
Fetch phase – retrieves the actual document source.
During the query phase the coordinating node creates a priority queue of size from + size , broadcasts the request to shards, each shard builds its own queue, and the coordinating node merges them to produce the final top‑N list.
Deep Pagination Problems
When from is large, every shard must return from + size hits, causing high CPU, memory, I/O and network usage. Sorting cost grows exponentially with depth. The index.max_result_window (default 10 000) limits size ; it can be increased if needed:
PUT _settings
{
"index": { "max_result_window": "10000000" }
}Official Deep‑Paging Solutions
Scroll
Scroll works like a cursor in relational databases and is suited for batch processing (e.g., mass messaging). It creates a snapshot of the index at the start of the scroll, returns a _scroll_id , and subsequent requests use that ID to fetch the next batch.
POST /twitter/tweet/_search?scroll=1m
{
"size": 100,
"query": { "match": { "title": "elasticsearch" } }
}Subsequent fetch:
POST /_search?scroll=1m
{ "scroll_id": "
" }Drawbacks: consumes resources for the snapshot and _scroll_id , and does not reflect real‑time changes.
Scroll Scan
Scroll Scan disables sorting for higher performance. It requires search_type=scan and the size parameter is per‑shard.
POST /my_index/my_type/_search?search_type=scan&scroll=1m&size=50
{ "query": { "match_all": {} } }Sliced Scroll
Sliced scroll splits a scroll request into multiple parallel slices, speeding up large data extraction.
POST /index/type/_search?scroll=1m
{
"query": { "match_all": {} },
"slice": { "id": 0, "max": 5 }
}
POST /index/type/_search?scroll=1m
{
"query": { "match_all": {} },
"slice": { "id": 1, "max": 5 }
}Search After
Introduced in ES 5, search_after provides a stateless cursor using the sort values of the last hit of the previous page.
POST twitter/_search
{
"size": 10,
"query": { "match": { "title": "es" } },
"sort": [ { "date": "asc" }, { "_id": "desc" } ]
}Use the returned sort array for the next request:
GET twitter/_search
{
"size": 10,
"query": { "match": { "title": "es" } },
"search_after": [124648691, "624812"],
"sort": [ { "date": "asc" }, { "_id": "desc" } ]
}Advantages: no snapshot, real‑time data, high performance. Disadvantages: requires a unique sort field, not suitable for large jumps.
Point‑In‑Time (PIT) with Search After (ES 7+)
From ES 7 onward, using PIT with search_after is recommended for deep pagination.
POST /my-index-000001/_pit?keep_alive=1mThen include the PIT ID in the search request.
Performance Comparison
Pagination Method
Performance
Pros
Cons
Use Case
from + size
Low
Simple, flexible
Deep‑paging cost
Small data sets (<10k)
scroll
Medium
Solves deep‑paging, good for bulk export
Snapshot overhead, scroll_id management
Mass data export
search_after
High
Best performance, reflects real‑time changes
Complex implementation, needs unique sort field, not for large jumps
Large‑scale real‑time pagination
Forward Paging
Elasticsearch has no native forward‑paging API; it can be simulated by reversing the sort order and using search_after on the first hit of the current page.
Conclusion
If the total result window is under 10 000 or only top‑N results are needed, use from/size .
For large data sets and batch jobs, use scroll (or scroll‑scan ).
For large data sets with real‑time, high‑concurrency queries, prefer search_after (optionally with PIT in ES 7+).
Personal Thoughts
Both scroll and search_after rely on cursor‑like mechanisms to avoid deep‑paging costs, but they are compromises: scroll requires maintaining a snapshot and scroll_id , while search_after cannot jump arbitrarily and may produce inconsistent results if the index changes between pages.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.