Backend Development 17 min read

Master Elasticsearch Pagination: From/Size, Scroll & Search After

This article examines Elasticsearch's three pagination strategies—basic from/size, scroll-based deep paging, and the newer search after method—detailing their execution phases, performance implications, code examples, and practical recommendations for choosing the appropriate approach based on data volume and real‑time requirements.

MaGe Linux Operations

Nov 20, 2022

Master Elasticsearch Pagination: From/Size, Scroll & Search After

1. Introduction

Elasticsearch is a real‑time distributed search and analytics engine commonly used for storing and quickly retrieving large amounts of unstructured data. Despite its many advantages over relational databases, it suffers from the same deep‑paging problem. This article analyzes that issue and presents practical solutions.

2. from + size Pagination

The from + size method is the most basic pagination approach in Elasticsearch, similar to the LIMIT clause in relational databases. from specifies the start position, and size specifies the number of records per page.

GET /wms_order_sku/_search
{
  "query": {"match_all": {}},
  "from": 10,
  "size": 20
}

This DSL returns results starting from the 10th document and retrieves the next 20 documents.

2.1 Query Phase

The query phase determines which document IDs match the query. It consists of three steps:

Client sends the request to the coordinating node, which creates a priority queue of size from + size to store results.

The coordinating node broadcasts the request to relevant shards; each shard executes the search and stores its own from + size priority queue.

Each shard returns its queue to the coordinating node, which merges them into a global priority queue, keeping the top from + size IDs.

2.2 Fetch Phase

After the query phase, the fetch phase retrieves the actual document contents for the selected IDs. It also has three steps:

The coordinating node requests the full documents for the IDs stored in its priority queue.

Shards return the requested documents (only the size documents needed for the current page).

The coordinating node returns the final page to the client.

2.3 ES Example

private SearchHits getSearchHits(BoolQueryBuilder queryParam, int from, int size, String orderField) {
    SearchRequestBuilder searchRequestBuilder = this.prepareSearch();
    searchRequestBuilder.setQuery(queryParam).setFrom(from).setSize(size).setExplain(false);
    if (StringUtils.isNotBlank(orderField)) {
        searchRequestBuilder.addSort(orderField, SortOrder.DESC);
    }
    log.info("getSearchHits searchBuilder:{}", searchRequestBuilder.toString());
    SearchResponse searchResponse = searchRequestBuilder.execute().actionGet();
    log.info("getSearchHits searchResponse:{}", searchResponse.toString());
    return searchResponse.getHits();
}

2.4 Summary

Elasticsearch limits the default result window to 10,000 documents ( index.max_result_window = 10000). When from + size exceeds this limit, the scroll API should be used instead of increasing the window size.

3. Scroll Pagination

Scroll pagination works like a database cursor. The first request creates and caches a snapshot, returning a scroll_id. Subsequent requests use this scroll_id to retrieve the next batch, reducing the cost of repeated sorting and fetching.

3.1 Execution Process

During the query phase, the coordinating node still merges ID sets from shards, but it stores the merged IDs as a snapshot. The fetch phase then reads size documents from this snapshot, returning a new scroll_id for the next page.

3.2 ES Example (initial request)

GET /_search
{
  "query": {"match_all": {}},
  "scroll": "1m",
  "size": 20
}

The response includes a _scroll_id that must be sent with the next request.

3.3 Implementation Example

protected <T> Page<T> searchPageByConditionWithScrollId(BoolQueryBuilder queryParam, Class<T> targetClass, Page<T> page) throws Exception {
    SearchResponse scrollResp = null;
    String scrollId = ContextParameterHolder.get("scrollId");
    if (scrollId != null) {
        scrollResp = getTransportClient().prepareSearchScroll(scrollId).setScroll(new TimeValue(60000)).execute().actionGet();
    } else {
        logger.info("Scroll pagination, scrollId is null");
        scrollResp = this.prepareSearch()
            .setSearchType(SearchType.QUERY_AND_FETCH)
            .setScroll(new TimeValue(60000))
            .setQuery(queryParam)
            .setSize(page.getPageSize()).execute().actionGet();
        ContextParameterHolder.set("scrollId", scrollResp.getScrollId());
    }
    SearchHit[] hits = scrollResp.getHits().getHits();
    List<T> list = new ArrayList<>(hits.length);
    for (SearchHit hit : hits) {
        T instance = targetClass.newInstance();
        this.convertToBean(instance, hit);
        list.add(instance);
    }
    page.setTotalRow((int) scrollResp.getHits().getTotalHits());
    page.setResult(list);
    return page;
}

3.4 Summary

Scroll pagination reduces query and sorting overhead, making it suitable for batch processing or non‑real‑time deep paging. However, it only supports next/previous navigation and cannot reflect real‑time data changes because the snapshot is fixed.

4. Search After Pagination

Introduced in Elasticsearch 5, the search‑after method also uses a cursor but records the last hit’s sort values, allowing real‑time data to be reflected without a snapshot.

4.1 Execution Process

Each page request returns the last document’s sort values. The next request includes these values in the search_after parameter, enabling the coordinating node to fetch the subsequent page without re‑scanning earlier results.

4.2 ES Example (first page)

GET /wms_order_sku2021_10/_search
{
  "query": {"bool": {"must": [{"range": {"shipmentOrderCreateTime": {"gte": "2021-10-12 00:00:00", "lt": "2021-10-15 00:00:00"}}}] }},
  "size": 20,
  "sort": [{"_id": {"order": "desc"}}, {"shipmentOrderCreateTime": {"order": "desc"}}]
}

4.3 Subsequent Page Example

GET /wms_order_sku2021_10/_search
{
  "query": {"bool": {"must": [{"range": {"shipmentOrderCreateTime": {"gte": "2021-10-12 00:00:00", "lt": "2021-10-15 00:00:00"}}}] }},
  "size": 20,
  "sort": [{"_id": {"order": "desc"}}, {"shipmentOrderCreateTime": {"order": "desc"}}],
  "search_after": ["SO-460_152-1447931043809128448-100017918838", 1634077436000]
}

4.4 Implementation Example

public <T> ScrollDto<T> queryScrollDtoByParamWithSearchAfter(BoolQueryBuilder queryParam, Class<T> targetClass, int pageSize, String afterId, List<FieldSortBuilder> fieldSortBuilders) {
    SearchResponse scrollResp;
    long now = System.currentTimeMillis();
    SearchRequestBuilder builder = this.prepareSearch();
    if (CollectionUtils.isNotEmpty(fieldSortBuilders)) {
        fieldSortBuilders.forEach(builder::addSort);
    }
    builder.addSort("_id", SortOrder.DESC);
    if (StringUtils.isBlank(afterId)) {
        SearchRequestBuilder srb = builder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setQuery(queryParam).setSize(pageSize);
        scrollResp = srb.execute().actionGet();
    } else {
        Object[] afterIds = JSON.parseObject(afterId, Object[].class);
        SearchRequestBuilder srb = builder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setQuery(queryParam).searchAfter(afterIds).setSize(pageSize);
        scrollResp = srb.execute().actionGet();
    }
    SearchHit[] hits = scrollResp.getHits().getHits();
    List<T> list = new ArrayList<>();
    if (ArrayUtils.getLength(hits) > 0) {
        list = Arrays.stream(hits)
            .filter(Objects::nonNull)
            .map(SearchHit::getSourceAsMap)
            .filter(Objects::nonNull)
            .map(JSON::toJSONString)
            .map(e -> JSON.parseObject(e, targetClass))
            .collect(Collectors.toList());
        afterId = JSON.toJSONString(hits[hits.length - 1].getSortValues());
    }
    return ScrollDto.<T>builder()
        .scrollId(afterId)
        .result(list)
        .totalRow((int) scrollResp.getHits().getTotalHits())
        .build();
}

4.5 Summary

Search after requires at least one globally unique field (e.g., _id plus a timestamp). It is stateless, reflects real‑time changes, and avoids the resource overhead of maintaining a scroll snapshot, making it ideal for high‑concurrency, real‑time deep paging.

5. Overall Comparison and Thoughts

5.1 Comparison of the Three Methods

For small result sets (<10,000 documents) or when only the top‑N results are needed, use the simple from/size approach.

For large data volumes and batch processing tasks (e.g., data migration), prefer the scroll API.

For large data volumes with real‑time, high‑concurrency user queries, choose the search‑after method.

5.2 Personal Reflections

Typical business pages show 10‑20 items per page; most users rarely need to navigate beyond a few hundred pages, so limiting queries to 10,000 results is often sufficient.

When exporting massive datasets, scroll or search after can be used, with search after generally offering better real‑time performance.

Avoid deep pagination whenever possible; if necessary, adjust max_result_window cautiously, but prefer the specialized pagination methods above.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend pagination scroll search_after from+size

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.