Big Data 12 min read

Elasticsearch Optimization Practices and Performance Tuning Guide

This article presents a comprehensive guide on optimizing Elasticsearch for large‑scale data platforms, covering Lucene fundamentals, index and shard architecture, doc‑values usage, routing strategies, practical performance‑tuning techniques, and real‑world testing results to achieve sub‑second query responses on billions of records.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Elasticsearch Optimization Practices and Performance Tuning Guide

The data platform has evolved through three versions, encountering common challenges that prompted the creation of detailed documentation focused on Elasticsearch (ES) optimization.

Project background: A business system generates over a hundred million rows per day, partitioned by day, but can only retain three months of data due to hardware constraints, making cross‑month queries and long‑term data export difficult.

Improvement goals:

Enable cross‑month queries and support exporting more than one year of historical data.

Achieve sub‑second response times for conditional queries.

Elasticsearch and Lucene basics: An ES cluster consists of nodes, indices, shards, and replicas. Each index maps to one or more Lucene shards, which are the smallest searchable units. Lucene stores data in segments, each containing documents and fields that are tokenized into terms.

Key concepts include:

Cluster: a group of nodes.

Node: a single ES service instance.

Index: logical namespace for one or more physical shards.

Shard: a Lucene instance holding a subset of data.

Replica: a copy of a shard for fault tolerance and load balancing.

DocValues: column‑store structure enabling fast look‑ups by document ID, useful for sorting, aggregations, and faceting.

Lucene’s index files consist of dictionaries, inverted lists, forward files, and DocValues. Random disk reads on .fdt, .tim, and .doc files are costly; SSDs are recommended.

Routing and shard allocation: By default, routing = murmurhash3(document ID). Custom routing can co‑locate related documents on the same primary shard, reducing search overhead.

Optimization case study: The solution stores only searchable fields in ES while keeping full data in HBase. Performance improvements include bulk writes, multi‑threaded ingestion, extending refresh_interval during bulk loads, allocating ~50% of node memory to Lucene file cache, using SSDs, and tuning merge thread counts.

Index‑side tuning recommendations:

Disable unnecessary doc_values and _source fields.

Prefer keyword over numeric types when range queries are not needed.

Batch writes and increase refresh_interval (e.g., "-1") then manually refresh.

Allocate ample heap (e.g., 64 GB) and use SSD storage.

Configure merge throttling and thread counts based on disk type.

Search‑side tuning recommendations:

Turn off doc values for fields not used in sorting or aggregations.

Use keyword fields for term queries instead of range queries.

Disable _source storage for fields not needed in results.

Replace scoring with filter queries when relevance scoring is unnecessary.

Prefer search_after over deep pagination; use scroll for large result sets.

Combine timestamp and ID into a single long field for efficient sorting.

Allocate ≥16 CPU cores for heavy sorting workloads.

Set merge.policy.expunge_deletes_allowed to 0 to force deletion of marked records during merges.

Example mapping configuration (shown unchanged):

{
    "mappings": {
        "data": {
            "dynamic": "false",
            "_source": {
                "includes": ["XXX"]
            },
            "properties": {
                "state": {
                    "type": "keyword",
                    "doc_values": false
                },
                "b": {
                    "type": "long"
                }
            }
        }
    },
    "settings": {......}
}

Performance testing: Benchmarks include single‑node tests with 50‑100 M records, cluster tests up to 3 B records, random query combinations, and SSD vs. HDD comparisons. Results show that with the applied optimizations, queries on tens of billions of records return 100 rows within 3 seconds, and pagination remains fast.

The platform is now stable, handling billions of records with sub‑second query latency, and can be further scaled by adding nodes if new bottlenecks appear.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Elasticsearchperformance tuningluceneIndex Optimization
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.