Master Elasticsearch Performance: Practical Production‑Level Optimization Guide
This guide presents a production‑grade, step‑by‑step approach to boost Elasticsearch performance, covering advanced index design, mapping best practices, query and aggregation tuning, JVM and cluster settings, bulk write optimization, monitoring, and real‑world log‑system scenarios with concrete code examples and configuration snippets.
Index Design (Advanced)
Shard Count Engineering Formula
Formula: target_shards ≈ total_data / 30GB
Example: 1 year of data ≈ 3TB → 3000GB → 100 primary shards. Recommended split: time‑based rollover, daily or weekly indices, 3‑6 primary shards per index.
PUT logs-2026.01.31 {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}Core principle: more shards is not always better; query cost = cross‑shard concurrency + merge overhead.
Mapping Pitfalls (90% of performance issues)
Common mistakes
Using text for all string fields
Relying on the default text+keyword dual field
Enabling fielddata on large fields
Recommended template
"status": { "type": "keyword", "norms": false },
"title": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" }High‑frequency optimizations:
Set index: false for fields not needed in search
Set doc_values: false for fields not used for sorting
Disable norms when scoring is unnecessary
Query Optimization (Quick Wins)
Proper Filter + Query Combination
{
"query": {
"bool": {
"must": [{ "match": { "title": "elasticsearch" } }],
"filter": [
{ "term": { "status": "ONLINE" } },
{ "range": { "create_time": { "gte": "now-7d" } } }
]
}
}
}Benefits: filter cache hits, no score contribution, significant CPU reduction.
Deep Pagination Alternatives
search_afterfor user paging (no global sort) scroll for full export (sequential scan) point in time for real‑time streaming (data consistency) "search_after": ["2026-01-31T10:00:00", 12345] Prohibited: from: 100000 (deep pagination).
Aggregation Performance Tricks
Control field cardinality
User ID / order number are high‑cardinality fields; avoid aggregating directly on them.
Tune
shard_size "terms": { "field": "user_id", "size": 20, "shard_size": 200 }Split query execution
Apply filters first
Run aggregations after filtering
Cluster & JVM Tuning (Stability Core)
JVM Heap Best Practice
-Xms16g
-Xmx16gReason: heap < 32GB enables compressed oops, reduces GC frequency and stabilizes latency.
Essential OS Settings
vm.max_map_count=262144
vm.swappiness=1
ulimit -n 65536
bootstrap.memory_lock: trueCircuit Breaker Settings
indices.breaker.total.limit: 70%
indices.breaker.request.limit: 40%
indices.breaker.fielddata.limit: 30%Prevents OOM on large aggregations and JVM full GC.
Write Performance (Logs / Tracing)
Bulk API Best Practices
POST _bulk
{ "index": {} }
{ "title": "doc1" }Batch size 5‑15 MB
Concurrent bulk controlled by thread‑pool size
Retry on failures
thread_pool.write.queue_size: 1000Write‑time Optimizations
"refresh_interval": "30s",
"number_of_replicas": 0After ingestion, restore replicas to 1 and refresh interval to 1 s.
Monitoring & Load‑Testing Loop
Key Monitoring Metrics
JVM GC time
Search latency P99
Segment count
Thread‑pool rejections
GET _nodes/stats
GET _cat/indices?vSlow Query Log (must enable)
index.search.slowlog.threshold.query.warn: 1s
index.search.slowlog.threshold.fetch.warn: 500msLoad‑testing Recommendations
Tools: JMeter, Rally
Scenarios: high‑concurrency search, large aggregations, mixed write‑query workloads
Common Pitfalls Summary
Too many shards
Aggregating on text fields
Deep pagination with from Excessively large JVM heap
Missing slow‑log configuration
Optimization order: Mapping → Query → Shards → JVM → Hardware
Scenario: Log System (High Write + Conditional Query)
Goals & Constraints
Write‑first, query‑second
Acceptable latency 30 s – 1 min
Hot data retained 7 – 30 days
Query pattern relatively fixed
Index & Lifecycle Design
Index Splitting Strategy
Time‑based rollover (mandatory)
Recommended daily indices
logs-2026.01.31
logs-2026.02.01Shard Design (template)
Daily data < 100 GB → 3 primary shards
100 GB – 300 GB → 6 primary shards
> 300 GB → 9 – 12 primary shards
Target shard size 20 – 40 GB
ILM Policy (must)
PUT _ilm/policy/logs_policy
{
"policy": {
"phases": {
"hot": { "actions": { "rollover": { "max_size": "40gb", "max_age": "1d" } } },
"warm": { "min_age": "7d", "actions": { "forcemerge": { "max_num_segments": 1 } } },
"delete": { "min_age": "30d", "actions": { "delete": {} } }
}
}
}Mapping Template for Logs
PUT _index_template/logs_template
{
"index_patterns": ["logs-*"],
"template": {
"settings": { "refresh_interval": "30s" },
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"level": { "type": "keyword" },
"service": { "type": "keyword" },
"host": { "type": "keyword" },
"trace_id": { "type": "keyword" },
"message": { "type": "text", "analyzer": "standard" }
}
}
}
}All filter fields use keyword; only message uses full‑text search; avoid aggregating on text fields.
Write Optimizations
Bulk batch 5‑15 MB
Concurrent bulk threads ≈ CPU cores / 2 "number_of_replicas": 0 After peak load, restore replicas to 1.
Query Template (optimal)
{
"query": {
"bool": {
"filter": [
{ "term": { "service": "order-service" } },
{ "range": { "@timestamp": { "gte": "now-1h" } } }
],
"must": [{ "match": { "message": "error" } }]
}
}
}Strong time + service filter; only message contributes to scoring.
JVM & Node Role Recommendations
Data node: 16 – 31 GB heap
Master node: 4 – 8 GB heap
Coordinator node (dedicated for log system): 8 – 16 GB heap
Monitoring & Alerting Focus
Write rejections > 0
JVM GC > 5 %
Segment count surge
Disk usage > 70 %
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ray's Galactic Tech
Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
