Why Elasticsearch Pagination Takes 10 Minutes and How to Reduce It to Seconds

This article examines a real‑world Elasticsearch pagination case where a range query across multiple indices took ten minutes, analyzes the root causes such as deep pagination, large time windows, and multi‑index scans, and presents concrete optimizations—including reducing page size, narrowing the time range, switching to search_after, and using index aliases—to bring query time down to seconds.

dbaplus Community
dbaplus Community
dbaplus Community
Why Elasticsearch Pagination Takes 10 Minutes and How to Reduce It to Seconds

Problem Description

A range query on multiple applcation* indices sorted by timestamp and _uuid_ took about ten minutes to return the first 100 hits. The query covers a seven‑day window (timestamps 1743609600000 to 1744214400000) and uses the from/size pagination method with from=0 and size=100. The total hit count exceeds 600 million, causing massive scanning, sorting and memory consumption.

Root Cause Analysis

Large result set : Even with from=0, sorting 600 M documents to return 100 rows is expensive.

Wide time range : Seven days of data (~85 M docs per day) forces a full scan of the range.

Wildcard index pattern : applcation* may match dozens or hundreds of indices, each scanned separately and merged.

Large page size : Returning 100 documents per page increases network and serialization overhead, especially if documents contain large fields.

Dual‑field sort : Sorting on both timestamp and _uuid_ builds a large heap, consuming additional CPU and memory.

Optimization Strategies

Reduce the page size.

Limit the time window to the smallest necessary range.

Replace from/size with search_after for deep pagination.

Use index aliases (or concrete index names) instead of wildcard patterns.

Implementation Details

Step 1 – Reduce size and time window

Change size from 100 to 10 and restrict the query to a single day (e.g., 2025‑03‑01).

POST /applcation*/_search
{
  "from": 0,
  "query": {
    "bool": {
      "filter": {
        "range": {
          "timestamp": {
            "from": "1743609600000",
            "include_lower": true,
            "include_upper": false,
            "to": "1743696000000"   // one‑day window
          }
        }
      }
    }
  },
  "size": 10,
  "sort": [
    {"timestamp": {"missing": "_last", "order": "desc", "unmapped_type": "keyword"}},
    {"_uuid_":   {"missing": "_last", "order": "desc", "unmapped_type": "keyword"}}
  ]
}

Result: data volume drops to ~1/7 of the original, dramatically reducing scan and sort time.

Step 2 – Use search_after for pagination

When the full seven‑day range is required, keep size=10 and paginate using the search_after values from the last hit of the previous page.

// First page (same as Step 1)
POST /applcation*/_search
{ ... "size": 10, "sort": [...] }

// Capture the last hit's sort values, e.g. ["1744214380000", "uuid456"]

// Next page
POST /applcation*/_search
{
  "query": {"bool": {"filter": {"range": {"timestamp": {"from": "1743609600000", "include_lower": true, "include_upper": false, "to": "1744214400000"}}}}},
  "size": 10,
  "search_after": ["1744214380000", "uuid456"],
  "sort": [
    {"timestamp": {"missing": "_last", "order": "desc", "unmapped_type": "keyword"}},
    {"_uuid_":   {"missing": "_last", "order": "desc", "unmapped_type": "keyword"}}
  ]
}

The combination of timestamp and _uuid_ guarantees a unique sort key, making search_after reliable.

Step 3 – Introduce index aliases

Create daily aliases that point to the concrete daily indices, then query the alias instead of the wildcard.

POST /_aliases
{
  "actions": [
    {"add": {"index": "applcation-2025-03-01", "alias": "applcation-day-20250301"}},
    {"add": {"index": "applcation-2025-03-02", "alias": "applcation-day-20250302"}}
  ]
}
POST /applcation-day-20250301/_search
{
  "query": {"bool": {"filter": {"range": {"timestamp": {"from": "1743609600000", "include_lower": true, "include_upper": false, "to": "1743696000000"}}}}},
  "size": 10,
  "sort": [
    {"timestamp": {"missing": "_last", "order": "desc", "unmapped_type": "keyword"}},
    {"_uuid_":   {"missing": "_last", "order": "desc", "unmapped_type": "keyword"}}
  ]
}

Effect: only the single‑day index is scanned, cutting I/O and CPU usage.

Result Summary

Smaller size reduces network transfer and sorting heap size.

Narrowing the time range cuts the hit count by roughly a factor of seven. search_after eliminates deep‑pagination overhead, turning minute‑level queries into second‑level ones.

Index aliases limit the number of indices involved in the search, further improving performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendElasticsearchsearch_after
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.