Mastering Elasticsearch Pagination: From From/Size to Scroll and Search After

Elasticsearch offers several pagination strategies—simple from/size, scroll, scroll‑scan, sliced scroll, and the newer search_after with point‑in‑time—each with distinct performance trade‑offs and use‑case suitability, and this guide explains their mechanics, limitations, and best‑practice recommendations for handling deep pagination.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
Mastering Elasticsearch Pagination: From From/Size to Scroll and Search After

Introduction

Elasticsearch is a real‑time distributed search and analytics engine. Typical use cases include pagination and data traversal. Similar to relational databases, deep pagination should be avoided in Elasticsearch.

From/Size Parameters

By default, a search returns the top 10 hits. To paginate, use the from and size parameters. from defines the number of hits to skip (default 0). size defines the maximum number of hits to return.

POST /my_index/my_type/_search
{
  "query": { "match_all": {} },
  "from": 100,
  "size": 10
}

This query retrieves 10 documents starting from the 100th hit.

How is this query executed internally?

Search consists of two phases: query (determines which documents to collect) and fetch (retrieves the actual documents).

Query phase

The coordinating node creates a priority queue of size from + size, broadcasts the request to relevant shards, each shard fills its own queue of the same size, and returns the top from + size results. The coordinating node merges these into a global queue for the fetch phase.

Fetch phase

During fetch, the coordinating node requests the actual documents for the IDs in the global queue, typically using a multi‑get request to avoid repeated calls to the same shard.

Deep Pagination Issues

When from is large (e.g., 1,000,000) and size is 100, each shard must return over a million hits to the coordinating node, causing high CPU, memory, I/O, and network usage. The cost of sorting grows exponentially with pagination depth.

Additional constraints: size cannot exceed index.max_result_window (default 10,000). To increase it, update the index settings:

PUT _settings
{
  "index": { "max_result_window": "10000000" }
}

The _doc type is deprecated and will be removed in future versions.

Scroll

Scroll Traversal

Scroll works like a cursor in relational databases and is suited for batch processing rather than real‑time queries.

It creates a snapshot of the index at the time of the request; new documents are not visible during the scroll.

Basic usage:

POST /twitter/tweet/_search?scroll=1m
{
  "size": 100,
  "query": { "match": { "title": "elasticsearch" } }
}

The response includes a _scroll_id used for subsequent requests:

POST /_search?scroll=1m
{
  "scroll_id": "XXXXXXXXXXXXXXXXXXXXXXX I am scroll id XXXXXXXXXXXXXXX"
}

Drawbacks: scroll IDs consume resources, snapshots are static, and the method is not suitable for real‑time pagination.

Scroll Scan

Scroll Scan improves performance by disabling sorting. It is appropriate when ordering is not required.

POST /my_index/my_type/_search?search_type=scan&scroll=1m&size=50
{
  "query": { "match_all": {} }
}

Key points: search_type=scan disables sorting. size controls the number of hits returned per shard; total results are number_of_shards * size.

Sliced Scroll

Sliced scroll splits a scroll request into multiple parallel slices, speeding up large data traversals.

POST /index/type/_search?scroll=1m
{
  "query": { "match_all": {} },
  "slice": { "id": 0, "max": 5 }
}

POST /ip:port/index/type/_search?scroll=1m
{
  "query": { "match_all": {} },
  "slice": { "id": 1, "max": 5 }
}

Do not set max larger than the number of shards to avoid memory issues.

Search After

Introduced in ES 5, search_after provides a stateless pagination mechanism similar to scroll but without maintaining a scroll ID.

Basic usage:

POST twitter/_search
{
  "size": 10,
  "query": { "match": { "title": "es" } },
  "sort": [ { "date": "asc" }, { "_id": "desc" } ]
}

The response contains a sort array for each hit; the last hit’s sort values are passed to the next request:

GET twitter/_search
{
  "size": 10,
  "query": { "match": { "title": "es" } },
  "search_after": [124648691, "624812"],
  "sort": [ { "date": "asc" }, { "_id": "desc" } ]
}

Advantages: no scroll ID, reflects real‑time changes, best performance for large datasets. Drawbacks: requires a unique, immutable sort field, not suitable for arbitrary page jumps.

ES 7 Changes

From version 7.x, Elasticsearch recommends using search_after with a point‑in‑time (PIT) instead of scroll for deep pagination.

POST /my-index-000001/_pit?keep_alive=1m

Subsequent searches include the PIT ID:

GET /_search
{
  "size": 10000,
  "query": { "match": { "user.id": "elkbee" } },
  "pit": { "id": "<pit_id>", "keep_alive": "1m" },
  "sort": [ { "@timestamp": { "order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type": "date_nanos" } } ]
}

Performance Comparison

The chart shows that from/size performance degrades sharply for deep pages, while search_after remains fast.

Forward Pagination

Elasticsearch does not provide a native API for backward pagination. It can be simulated by reversing the sort order and using search_after with the first hit of the current page.

Final Recommendations

If the total result window is within 10,000 and only top‑N results are needed, use simple from/size pagination.

For large datasets and batch processing (e.g., data migration), use scroll.

For large datasets with real‑time, high‑concurrency queries, use search_after (preferably with PIT).

Personal Thoughts

Both scroll and search_after rely on cursor‑like mechanisms to solve deep pagination, but they are not ultimate solutions; deep pagination should be avoided whenever possible.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceElasticsearchSearchscrollsearch_after
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.