Elasticsearch Indexing and Search Optimization for Billion‑Scale Data
This article explains how to design, tune, and optimize an Elasticsearch‑based data platform handling hundreds of billions of records, covering Lucene fundamentals, shard routing, index and query performance tricks, and practical benchmark results for large‑scale deployments.
Introduction The author shares lessons learned from three iterations of a data platform that now stores hundreds of billions of records in Elasticsearch, focusing on index optimization and query performance.
Requirement Overview The system must support cross‑month queries and export of over one year of data while returning query results within seconds, despite daily tables containing over 100 million rows and a three‑month retention limit in the relational database.
Elasticsearch Retrieval Principles
3.1 Elasticsearch and Lucene Basics
Understanding the underlying components is essential for optimization. An Elasticsearch index consists of multiple Lucene shards, each containing segments that store documents.
Cluster 包含多个Node的集群
Node 集群服务单元
Index 一个ES索引包含一个或多个物理分片,它只是这些分片的逻辑命名空间
Type 一个index的不同分类,6.x后只能配置一个type,以后将移除
Document 最基础的可被索引的数据单元,如一个JSON串
Shards 一个分片是一个底层的工作单元,它仅保存全部数据中的一部分,它是一个Lucence实例 (一个lucene索引最大包含2,147,483,519 (= Integer.MAX_VALUE - 128)个文档数量)
Replicas 分片备份,用于保障数据安全与分担检索压力Lucene stores inverted indexes, forward files, and DocValues. Access patterns such as sorting, faceting, and aggregations rely heavily on DocValues.
3.2 Lucene Index Implementation
Key file types include dictionaries, posting lists, forward files, and DocValues. Random disk reads on .fdt, .tim, and .doc files can become bottlenecks; SSDs and sufficient memory caching are recommended.
3.3 Elasticsearch Sharding
Document routing determines the target primary shard: shard = hash(routing) % number_of_primary_shards. Using a consistent _routing value (e.g., the HBase rowkey) keeps related data on the same shard, reducing query load.
Optimization Cases
4.1 Index Performance Optimizations
Bulk write with appropriate batch size (hundreds to thousands of records).
Multi‑threaded ingestion, matching the number of nodes.
Increase refresh_interval (e.g., set to -1) during bulk loads and refresh manually after ingestion.
Allocate ~50 % of node RAM to Lucene file cache; use nodes with 64 GB+ memory.
Prefer SSDs over HDDs for random I/O.
Use custom IDs that map to HBase rowkeys to enable efficient updates/deletes.
Throttle segment merges (e.g., indices.store.throttle.max_bytes_per_sec: "200mb") and limit merge threads based on storage type.
4.2 Query Performance Optimizations
Disable doc values for fields that are never sorted or aggregated.
Prefer keyword fields over numeric types for exact matches; term queries are faster than range queries.
Turn off _source for fields not needed in the response to save disk space.
Use filters or constant_score queries to avoid scoring when relevance is irrelevant.
Pagination strategies: avoid deep from+size (default index.max_result_window: 10000), use search_after for deep navigation, or scroll for large result sets.
Store a combined timestamp‑ID long field to simplify sorting without extra overhead.
Allocate 16 + CPU cores for heavy sorting workloads.
Set merge.policy.expunge_deletes_allowed: 0 to force immediate removal of deleted docs during merges.
{
"mappings": {
"data": {
"dynamic": "false",
"_source": {
"includes": ["XXX"] -- 仅将查询结果所需的数据存储在 _source 中
},
"properties": {
"state": {
"type": "keyword", -- 虽然 state 为 int,但若不做范围查询,使用 keyword 更高效
"doc_values": false -- 关闭不需要的 doc values
},
"b": {
"type": "long" -- 需要范围查询的字段使用 long/int
}
}
}
}, "settings": {......}
}Performance Testing
Single‑node test with 50 M–100 M records to gauge point‑load capacity.
Cluster test with 1 B–3 B records to measure disk I/O, memory, CPU, and network usage.
Random query combinations across data volumes to evaluate latency.
Compare SSD vs. HDD performance.
Production Results After applying the above optimizations, the platform can query 100 rows from a hundred‑billion‑record index within 3 seconds, with fast pagination and the ability to scale by adding nodes when needed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
