How to Achieve Sub‑Second Queries on Billions of Records with Elasticsearch
This article explains how a data platform handling billions of daily records can be optimized for cross‑month queries and sub‑second response times by tuning Elasticsearch indexing, shard routing, Lucene structures, and hardware configurations.
1. Introduction
The data platform has evolved through three versions and initially faced common challenges; the author now shares a refined documentation focusing on Elasticsearch (ES) optimization, while HBase and Hadoop design optimizations are referenced elsewhere.
2. Requirements
Project background: A business system stores over a hundred million rows per day in sharded tables, retains only three months of data, and incurs high costs for additional shards.
Improvement goals:
Enable cross‑month queries and support over a year of historical data export.
Achieve second‑level query response times for conditional searches.
3. Elasticsearch Retrieval Principles
3.1 Basic ES and Lucene Architecture
Understanding component fundamentals helps locate bottlenecks. The ES cluster consists of clusters, nodes, indices, types, documents, shards, and replicas, as illustrated below.
Cluster – a group of nodes. Node – a single ES service instance. Index – logical namespace containing one or more physical shards. Type – classification within an index (single type after ES 6.x). Document – basic indexable unit, e.g., a JSON object. Shards – low‑level work units, each a Lucene instance. Replicas – shard copies for fault tolerance and query load distribution.
ES relies on Lucene; optimizing data structures essentially means optimizing Lucene. Lucene stores data in segments, each containing multiple documents and fields, which are tokenized into terms.
Key Lucene files include dictionaries, inverted lists, forward files, and DocValues. Random disk reads in Lucene are costly; SSDs are recommended for .tim and .doc files, while .fdt files consume significant space.
3.2 Lucene Index Implementation
Lucene’s index files consist of dictionaries, inverted tables, forward files, and DocValues, as shown in the diagrams.
http://lucene.apache.org/core/7_2_1/core/org/apache/lucene/codecs/lucene70/package-summary.html#package.description
DocValues provide column‑store access for fast sorting and aggregation. ES enables DocValues by default for all fields except analyzed strings; disabling unnecessary DocValues reduces resource usage.
3.3 ES Index and Search Sharding
ES indices are composed of one or more Lucene indices, each containing multiple segments. Data placement follows the formula: shard = hash(routing) % number_of_primary_shards By default, the routing parameter is the document ID (MurmurHash3). Supplying a consistent _routing value groups related data on the same shard, reducing search overhead.
4. Optimization Cases
In the presented case, queries are limited to fixed fields (no full‑text search), enabling sub‑second responses for billions of records.
ES stores only HBase row keys, not full data.
Actual data resides in HBase and is retrieved via row‑key lookups.
Indexing and search performance improvements follow official ES guidelines.
4.1 Indexing Performance
Batch writes – each batch contains hundreds to thousands of records.
Multi‑threaded writes – thread count matches the number of machines; monitor performance via Kibana.
Increase segment refresh interval – set "refresh_interval": "-1" and manually refresh after bulk ingestion.
Memory allocation – allocate ~50% of system memory to Lucene file cache; nodes often require >64 GB RAM.
SSD storage – SSDs outperform RAID5/10 for random I/O.
Use auto‑generated IDs – custom keys (e.g., HBase row keys) are acceptable when update/delete operations are needed.
Segment merging – limit merge throughput (default 20 MB/s) and configure thread count:
Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2)). For SSDs, six merge threads were used.
4.2 Search Performance
Disable DocValues for fields that do not require sorting or aggregation.
Prefer keyword over numeric types for term queries.
Disable _source storage for fields not needed in query results.
Turn off scoring when unnecessary; use filter or constant_score queries.
Pagination strategies: from + size – limited by index.max_result_window (default 10 000). search_after – uses the last hit of the previous page. scroll – for large result sets, but requires maintaining a scroll_id.
Combine time and ID into a single long field to improve sorting performance.
Allocate CPUs with ≥16 cores for sorting‑intensive workloads.
Set "merge.policy.expunge_deletes_allowed": "0" to force deletion of marked records during merges.
5. Performance Testing
Baseline benchmarks are essential to measure improvements. Tests conducted include:
Single‑node load of 50 M–100 M records to assess point‑node capacity.
Cluster tests with 1 B–3 B records to evaluate disk I/O, memory, CPU, and network consumption.
Random query combinations across data volumes.
SSD vs. HDD performance comparison.
Comprehensive testing helps identify bottlenecks and validates optimization effectiveness.
6. Production Results
After applying the optimizations, the platform runs stably, handling tens of billions of records with 100‑record queries returning within 3 seconds, and pagination remains fast. Future bottlenecks can be addressed by scaling nodes.
Source: https://www.cnblogs.com/mikevictor07/p/10006553.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
