Big Data 12 min read

Boost Elasticsearch Queries on Billions of Docs: Filesystem Cache & Smart Design

Elasticsearch performance at billions‑scale can be dramatically improved by leveraging the OS filesystem cache, limiting indexed fields, separating hot and cold data, pre‑warming caches, and using scroll or search_after for pagination, while avoiding costly joins and ensuring the dataset fits in memory.

21CTO
21CTO
21CTO
Boost Elasticsearch Queries on Billions of Docs: Filesystem Cache & Smart Design

When interviewers ask how to improve Elasticsearch (ES) query efficiency on tens of billions of records, the answer lies in practical experience rather than theoretical expectations; ES performance is often slower than assumed.

Initial searches on massive datasets (hundreds of millions of documents) can take 5–10 seconds, but subsequent queries may drop to a few hundred milliseconds as caches warm up.

Filesystem Cache as the Key Optimizer

All ES data is written to disk files, and the operating system automatically caches these files in the filesystem cache. Allocating sufficient memory for this cache—ideally enough to hold all index segment files—allows most queries to run entirely in memory, delivering millisecond‑level latency.

Example: a three‑node ES cluster with 64 GB RAM per node (total 192 GB) allocated 32 GB JVM heap per node leaves only 32 GB per node for filesystem cache (96 GB total). If the total index size is 1 TB (≈300 GB per node), only a tenth of the data fits in cache, causing many queries to hit disk and suffer 5–10 s response times.

Best practice: ensure that the memory available for the filesystem cache can hold at least half of the total data, or keep indexed data size within the cache capacity.

Hybrid ES + HBase Architecture

Store only the fields needed for search (e.g., id, name, age) in ES, and keep the remaining fields in a storage system like MySQL or HBase. After retrieving a small set of document IDs from ES, fetch the full records from HBase, reducing ES storage pressure and improving cache efficiency.

Data Warm‑up

Periodically query hot data (e.g., popular user profiles or frequently viewed products) to keep it resident in the filesystem cache. Automated background jobs can “pre‑warm” this data, ensuring subsequent user requests hit memory instead of disk.

Cold‑Hot Data Separation

Separate rarely accessed (cold) data into its own ES index and allocate it to different nodes than the hot index. This prevents cold data from evicting hot data from the cache, maintaining high performance for the majority of queries.

Document Model Design

Avoid complex joins, nested queries, and parent‑child relationships in ES. Instead, denormalize data during ingestion so that searches require no runtime joins. Keep the document model simple and aligned with the search use‑case.

Pagination Performance

Deep pagination is costly because ES must collect and sort large result sets from each shard. Instead of allowing arbitrary page jumps, limit pagination depth or use alternatives:

Scroll API : creates a snapshot of the result set and returns pages via a scroll_id, offering millisecond‑level performance for sequential scrolling (e.g., infinite‑scroll feeds).

search_after : uses the sort values of the last hit to fetch the next page, also suitable only for forward‑only navigation.

Both methods require a unique sort field and do not support random page access.

Overall, improving ES performance at massive scale involves maximizing filesystem cache usage, minimizing indexed fields, separating hot and cold data, pre‑warming caches, simplifying document structures, and adopting pagination strategies that avoid deep page jumps.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceElasticsearchdata modelingpaginationFilesystem Cache
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.