Operations 12 min read

Guidelines for Sizing and Benchmarking Elasticsearch Clusters

This article provides a comprehensive guide on allocating hardware resources, calculating cluster size based on data volume, and conducting index and search benchmark tests for Elasticsearch, offering practical formulas, test configurations, and performance conclusions to help engineers design stable, high‑throughput clusters.

Top Architect

Nov 19, 2022

Guidelines for Sizing and Benchmarking Elasticsearch Clusters

1. Hardware Resource Allocation

Performance depends on the tasks Elasticsearch performs and the platform it runs on. Key resources include:

Disk Storage – Prefer SSDs, especially for nodes handling search and indexing; use hot‑warm architecture to control costs; local disks are ideal on bare metal; Elasticsearch does not require RAID, and a single replica shard is sufficient for fault tolerance.

Memory – JVM metadata should use about 50% of available RAM; the remaining RAM is used by the OS cache to speed up full‑text search, aggregations, and sorting.

CPU – The number and speed of CPU cores determine average operation speed and peak throughput.

Network – Bandwidth and latency affect inter‑node communication and cross‑cluster features.

2. Determining Cluster Size Based on Data Volume

For metric and log use cases, estimate daily raw data, retention period, and required replica count. Add 5‑10% for error margin and 15% for disk headroom, plus a spare node for redundancy.

Formulas

Data total (GB) = daily_raw_data(GB) * retention_days * (replicas + 1) Storage total (GB) = data_total(GB) * (1 + 0.15 + 0.1) Number of data nodes =

ROUNDUP(storage_total(GB) / (node_memory / memory_to_data_ratio))

Examples:

Small cluster : 1 GB/day, 9 months retention, 8 GB RAM per node → 3 nodes.

Large cluster : 100 GB/day, 30 days hot tier, 12 months warm tier, 64 GB RAM per node → 5 hot nodes, 10 warm nodes.

3. Benchmark Testing

Before production, run benchmarks using Rally . Separate tests for indexing and searching.

4. Index Benchmark

Tested a 3‑node cluster (8 vCPU, HDD, 32 GB RAM, 16 GB JVM). Metricbeat dataset: 1.2 GB, 1,079,600 docs.

Optimal batch size ≈ 12,000 docs (≈13.7 MB) → ~13,000 indexing requests per second. Best client count = 16 → ~62,000 requests/s.

Results:

1 node, 1 shard → 22,000 req/s

2 nodes, 2 shards → 43,000 req/s

3 nodes, 3 shards → 62,000 req/s

5. Search Benchmark

Target: 20 clients, 1,000 OPS. Queries evaluated on Metricbeat and HTTP log datasets.

Observations:

For Metricbeat, auto-data-histogram-with-tz query has the longest service time.

For HTTP logs, desc_sort_timestamp and asc_sort_timestamp are slower.

Parallel queries increase 90% service time.

When indexing and searching run concurrently, service time rises noticeably.

Combined run (32 indexing clients, 20 search clients) yielded 173,000 indexing throughput and 1,000 search requests per second.

6. Conclusion

By adjusting test methodology, you can derive a systematic way to calculate required node count from data volume. Use realistic data and query patterns for benchmarking, and consider a safety margin and a spare node for future performance planning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance testing Benchmarking Cluster Sizing

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.