Supercharging Elasticsearch for Billion-Row Queries: Practical Tips
This guide details how to optimize Elasticsearch for handling billions of daily records, covering core Lucene concepts, index and shard configuration, performance‑tuning parameters, and practical testing methods to achieve sub‑second query responses and long‑term data retention.
Requirement Overview
The platform processes over a hundred million rows per day, partitioned by day, but must retain only three months of data while supporting cross‑month queries and exporting data older than one year. The goal is sub‑second query latency and the ability to query data across months.
Elasticsearch Retrieval Fundamentals
Understanding Elasticsearch (ES) starts with its underlying Lucene architecture. Key components include:
Cluster : a group of nodes.
Node : a single ES service instance.
Index : a logical namespace that contains one or more physical shards.
Type : classification within an index (limited to one type after ES 6.x).
Document : the basic JSON unit indexed.
Shard : a low‑level Lucene segment storing a subset of data.
Replica : a copy of a shard for redundancy and load balancing.
Lucene stores data in segments , each containing multiple documents and fields. The inverted index maps terms to document IDs, while DocValues provide column‑store structures for fast sorting, faceting, and aggregations.
Lucene Index Structure
Lucene’s on‑disk files include dictionaries, posting lists, forward files, and .fdt (DocValues) files. Random reads on .fdt are costly; SSDs are recommended for these files. Scoring can be disabled to reduce CPU load.
Sharding and Routing
Each ES index consists of multiple primary shards, each a Lucene index. Document routing determines the target shard:
shard = hash(routing) % number_of_primary_shardsBy default, routing is the document ID (MurmurHash3). Supplying a custom _routing value can co‑locate related documents on the same shard, reducing query overhead.
Optimization Cases
1. Index‑Performance Tuning
Bulk indexing with batch sizes of a few hundred to a few thousand documents.
Multi‑threaded ingestion, scaling thread count to the number of machines.
Increase refresh_interval (e.g., set to -1) and manually refresh after bulk load.
Allocate ~50% of system memory to Lucene file cache; nodes with 64 GB+ RAM are recommended.
Use SSDs; RAID5/10 on HDDs still suffers from random I/O latency.
Prefer custom IDs that match HBase row keys for easier updates/deletes.
Configure merge throttling and thread count, e.g.,
"indices.store.throttle.max_bytes_per_sec": "200mb"
index.merge.scheduler.max_thread_count: 1 // for HDD
index.merge.scheduler.max_thread_count: 6 // for SSD (as used in the case)2. Search‑Performance Tuning
Disable doc_values on fields that are never sorted or aggregated.
Prefer keyword fields over long/int for exact term queries.
Turn off _source storage for fields not needed in the response.
Replace scoring queries with filter or constant_score when relevance is irrelevant.
Pagination strategies:
from + size : limited by index.max_result_window (default 10 000).
search_after : use the last hit’s sort values for deep pagination.
scroll : for large result sets, at the cost of maintaining a scroll_id.
Introduce a combined long field (timestamp + ID) to support efficient sorting.
Allocate CPUs with ≥16 cores for heavy sorting workloads.
Set "merge.policy.expunge_deletes_allowed": "0" to force immediate removal of deleted docs during merges.
Performance Testing
Benchmarks were conducted on:
Single‑node handling 50 M–100 M records to gauge point‑load limits.
Cluster tests with 100 M–3 B records, measuring disk I/O, memory, CPU, and network usage.
Random query combinations across varying data volumes.
SSD vs. HDD performance comparison.
Results showed sub‑second response times for 100‑record queries on billions of rows, confirming the effectiveness of the applied optimizations.
Production Impact
After deploying the optimizations, the platform consistently returns 100‑record queries within 3 seconds, with fast pagination. Future scaling can be achieved by adding nodes to distribute load.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
