Deep Pagination in Elasticsearch: scroll, sliced scroll, and search_after
When retrieving large result sets from Elasticsearch, the traditional from+size method hits a 10,000 record limit and can destabilize the cluster, so Elasticsearch offers three deep‑pagination techniques—scroll, sliced scroll, and search_after—to efficiently fetch massive data batches.
When using Elasticsearch to fetch data, the traditional from + size approach cannot retrieve all records once the dataset exceeds the default maximum of 10,000, because paging consumes large memory and may destabilize the cluster.
Elasticsearch provides three methods for deep pagination: scroll , sliced scroll , and search_after .
scroll
The scroll API returns a scroll_id on the first request; subsequent requests use this ID to retrieve the next batch of results. It is intended for offline bulk processing such as data export, migration, or reindexing, and cannot be used for real‑time user queries. The same scroll_id cannot be processed in parallel.
POST /twitter/_search?scroll=1m
{
"size": 100,
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}The scroll=1m parameter keeps the scroll_id context alive for one minute.
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}Because a single scroll_id cannot be processed in parallel, Elasticsearch introduced sliced scroll , which allows multiple scrolls to run concurrently by dividing the data into slices (typically matching the number of shards).
sliced scroll
In addition to the scroll context time, sliced scroll requires specifying the maximum number of slices and the current slice ID. Each slice operates like a normal scroll request.
GET /twitter/_search?scroll=1m
{
"slice": {
"id": 0,
"max": 2
},
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}
GET /twitter/_search?scroll=1m
{
"slice": {
"id": 1,
"max": 2
},
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}Parallel processing makes sliced scroll considerably faster than a single scroll.
search after
Both scroll methods are unsuitable for high‑concurrency online queries. search_after provides a dynamic pointer approach: it uses the sort values of the last hit from the previous page to fetch the next page, eliminating the need for a scroll context.
GET twitter/_search
{
"size": 10,
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"sort": [
{"date": "asc"},
{"tie_breaker_id": "asc"}
]
}To retrieve the following page, supply the search_after values from the last document:
GET twitter/_search
{
"size": 10,
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"search_after": [1463538857, "654323"],
"sort": [
{"_score": "desc"},
{"tie_breaker_id": "asc"}
]
}While search_after does not support random page jumps, it can handle multiple concurrent queries and requires a uniquely sortable field as the pointer.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.