Big Data 5 min read

Deep Pagination in Elasticsearch: scroll, sliced scroll, and search_after

When retrieving large result sets from Elasticsearch, the traditional from+size method hits a 10,000 record limit and can destabilize the cluster, so Elasticsearch offers three deep‑pagination techniques—scroll, sliced scroll, and search_after—to efficiently fetch massive data batches.

Big Data Technology Architecture

Aug 28, 2019

Deep Pagination in Elasticsearch: scroll, sliced scroll, and search_after

When using Elasticsearch to fetch data, the traditional from + size approach cannot retrieve all records once the dataset exceeds the default maximum of 10,000, because paging consumes large memory and may destabilize the cluster.

Elasticsearch provides three methods for deep pagination: scroll , sliced scroll , and search_after .

scroll

The scroll API returns a scroll_id on the first request; subsequent requests use this ID to retrieve the next batch of results. It is intended for offline bulk processing such as data export, migration, or reindexing, and cannot be used for real‑time user queries. The same scroll_id cannot be processed in parallel.

POST /twitter/_search?scroll=1m
{
    "size": 100,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}

The scroll=1m parameter keeps the scroll_id context alive for one minute.

POST /_search/scroll
{
    "scroll" : "1m",
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}

Because a single scroll_id cannot be processed in parallel, Elasticsearch introduced sliced scroll , which allows multiple scrolls to run concurrently by dividing the data into slices (typically matching the number of shards).

sliced scroll

In addition to the scroll context time, sliced scroll requires specifying the maximum number of slices and the current slice ID. Each slice operates like a normal scroll request.

GET /twitter/_search?scroll=1m
{
    "slice": {
        "id": 0,
        "max": 2
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
GET /twitter/_search?scroll=1m
{
    "slice": {
        "id": 1,
        "max": 2
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}

Parallel processing makes sliced scroll considerably faster than a single scroll.

search after

Both scroll methods are unsuitable for high‑concurrency online queries. search_after provides a dynamic pointer approach: it uses the sort values of the last hit from the previous page to fetch the next page, eliminating the need for a scroll context.

GET twitter/_search
{
    "size": 10,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "sort": [
        {"date": "asc"},
        {"tie_breaker_id": "asc"}
    ]
}

To retrieve the following page, supply the search_after values from the last document:

GET twitter/_search
{
    "size": 10,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "search_after": [1463538857, "654323"],
    "sort": [
        {"_score": "desc"},
        {"tie_breaker_id": "asc"}
    ]
}

While search_after does not support random page jumps, it can handle multiple concurrent queries and requires a uniquely sortable field as the pointer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch pagination search_after scroll API

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.