Databases 16 min read

Mastering Elasticsearch: Real-World Indexing & Query Performance Tips

This article shares practical Elasticsearch experience covering index and query performance optimization, shard routing strategies, JVM tuning, daily maintenance, and answers to common production questions, providing actionable guidance for building high‑availability search clusters.

21CTO

Aug 14, 2015

Mastering Elasticsearch: Real-World Indexing & Query Performance Tips

This article compiles practical Elasticsearch experience shared by senior engineer Wang Weihua, focusing on index performance, query performance, and essential configuration tips.

Index Performance

Before optimizing indexing speed, assess whether the bottleneck lies in the source database; Elasticsearch indexing is already fast. Issues like slow indexing after an upgrade may stem from plugins (e.g., IK analyzer). Recommended optimizations include using SSDs, reducing fragmentation, setting replica to 0 during initial indexing, and using bulk indexing when the document creation rate can keep up.

Key settings: threadpool.index.queue_size increase, indices.memory.index_buffer_size to 10%, index.translog.flush_threshold_ops to 50000, and adjusting refresh_interval.

Query Performance

Routing is crucial for fast queries; using multiple clusters with different routing dimensions (e.g., user, city, category) distributes load and keeps CPU usage low. Split‑index and merge‑query strategies help manage large shards. Avoid excessive shards (recommended ≤3 per node) and prefer split‑index over many shards.

Reduce index size by omitting non‑essential fields (e.g., description) and consider separate clusters for such fields. Limit heavy queries like range and replace facet with aggregations.

Other Important Settings

Use a fixed thread pool for indexing and searching to avoid shard relocation issues. Daily force‑merge with max_num_segments=1 improves query speed. For JVM, prefer smaller heap sizes (e.g., 32 GB) and consider G1 or CMS collectors; avoid crossing the 32 GB boundary due to compressed OOPs limitations.

Enable bootstrap.mlockall to lock heap memory, and consider SSDs for better indexing throughput.

Plugins & Tools

Recommend the Kopf plugin for cluster management, providing a user‑friendly UI for monitoring APIs.

Q&A Highlights

• Production JVM parameters: CMS settings, ES_HEAP_NEWSIZE, and minimal Full GC impact. • Hardware recommendations: ample memory, multi‑core CPUs, SSDs for indexing, moderate disk capacity. • Facet vs. aggregation: use aggregations for better performance. • Reindexing is essentially delete‑plus‑add at the Lucene level. • Docker is not used; deployment relies on Puppet and manual network settings. • Data bus architecture: use Redis + Elasticsearch with routing to keep data consistent across clusters. • Cluster rebalance settings (e.g., cluster.routing.allocation) should be tuned based on shard count.

Overall, the shared practices aim to keep Elasticsearch clusters low‑pressure, scalable, and capable of handling billions of documents with fast query response.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

JVM Elasticsearch Sharding Routing query performance tuning

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.