Elasticsearch Optimization Practices for Large-Scale Data Platforms

This article explains the architecture of Elasticsearch and Lucene, outlines common performance bottlenecks, and provides concrete indexing and query optimization techniques—including shard routing, refresh intervals, doc values, and hardware considerations—to achieve sub‑second query responses on billions of records.

Architect's Tech Stack
Architect's Tech Stack
Architect's Tech Stack
Elasticsearch Optimization Practices for Large-Scale Data Platforms

1. Introduction

The data platform has evolved through three versions, encountering many typical challenges; this article shares refined documentation focusing on Elasticsearch (ES) optimization.

2. Requirements

Project Background

In a business system, some tables generate over a hundred million rows per day; data is partitioned by day, retained for three months, and cross‑month queries are needed.

Improvement Goals

1) Enable cross‑month queries and support more than one year of historical data export. 2) Achieve second‑level response times for conditional queries.

3. ES Retrieval Principles

3.1 ES and Lucene Basics

Understanding component fundamentals is essential for pinpointing bottlenecks. ES relies on Lucene for storage and search.

Basic Concepts

Cluster : a group of Nodes. Node : a service unit within a cluster. Index : logical namespace for one or more physical shards. Type : classification within an index (single type after ES 6.x). Document : the smallest indexable JSON unit. Shards : Lucene instances storing subsets of data. Replicas : shard copies for safety and load distribution.

3.2 Lucene Index Implementation

Lucene index files consist of dictionaries, inverted lists, forward files, and DocValues.

DocValues provide column‑store access for sorting, faceting, and aggregation, but they consume resources; disabling unused DocValues can improve performance.

3.3 ES Index and Search Sharding

Data is stored in shards based on hash(routing) % number_of_primary_shards. By default, routing uses the document ID (MurmurHash3), but the _routing parameter can force related documents onto the same shard, reducing search work.

4. Optimization Cases

In our case, queries are field‑specific (no full‑text search), allowing billion‑row queries to return within seconds.

4.1 Indexing Performance

1) Batch writes (hundreds to thousands of records per request). 2) Multi‑threaded writes matching the number of machines. 3) Increase refresh_interval (e.g., set to "-1") during bulk ingestion and manually refresh afterward. 4) Allocate ~50% of system memory to Lucene file cache; nodes often need >64 GB RAM. 5) Use SSDs for random I/O. 6) Use custom keys aligned with HBase rowkeys. 7) Tune segment merging: adjust indices.store.throttle.max_bytes_per_sec and index.merge.scheduler.max_thread_count based on disk type.

4.2 Search Performance

1) Disable DocValues for unused fields. 2) Prefer keyword fields over numeric types for term queries. 3) Disable _source storage for fields not needed in results. 4) Use filters or constant_score queries to avoid scoring overhead.

4.3 Pagination

Discusses the cost of from+size pagination, the search_after technique, and scroll API limitations.

{
  "mappings": {
    "data": {
      "dynamic": "false",
      "_source": {"includes": ["XXX"]},
      "properties": {
        "state": {"type": "keyword"},
        "doc_values": false,
        "b": {"type": "long"}
      }
    },
    "settings": {......}
  }
}

5. Performance Testing

Benchmarks include single‑node tests with 50 M–100 M records, cluster tests up to 3 B records, and comparisons of SSD vs. HDD I/O.

6. Production Impact

The platform now handles billions of records with 100‑row queries returning in under 3 seconds, and scaling can be achieved by adding nodes.

Author: mikevictor Source: http://suo.im/5Aytfg

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceindexingElasticsearchluceneSearch
Architect's Tech Stack
Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.