Understanding and Improving Elasticsearch Shard Balancing Strategies
This article analyzes Elasticsearch shard imbalance incidents, explains the built‑in shard balancing algorithm and its configuration parameters, demonstrates weight calculations with source code, and proposes practical improvements—including shard count adjustments and a custom load‑aware balancing tool—to achieve more effective cluster load distribution.
From the Incident
During a morning peak, an Elasticsearch cluster raised many query‑timeout alerts, and monitoring showed that only node 123 had a large backlog of queries, with its I/O usage far exceeding other nodes.
Further investigation revealed that high‑load index shards (high QPS, large data, complex queries) were concentrated on node 123, causing severe load imbalance and occasional failures.
ES Shard Balancing Strategy
Concept
Elasticsearch aims to distribute index shards evenly across nodes to achieve load balancing and high availability. The goal is to keep the number of shards per node roughly equal.
Elasticsearch 的分片均衡是指将索引的分片均匀地分布在集群中的各个节点上,以实现负载均衡和高可用性。在 Elasticsearch 中,索引被分为多个分片,每个分片可以在不同的节点上进行存储和处理。分片均衡的目标是确保每个节点上的分片数量大致相等,避免某些节点负载过重,而其他节点负载较轻的情况。Balancing Policy
The core weight calculation uses indexBalance and shardBalance factors. The weight for a node is computed as:
WeightFunction(float indexBalance, float shardBalance) {
float sum = indexBalance + shardBalance;
if (sum <= 0.0f) {
throw new IllegalArgumentException("Balance factors must sum to a value > 0 but was: " + sum);
}
theta0 = shardBalance / sum;
theta1 = indexBalance / sum;
this.indexBalance = indexBalance;
this.shardBalance = shardBalance;
}
float weight(Balancer balancer, ModelNode node, String index) {
final float weightShard = node.numShards() - balancer.avgShardsPerNode();
final float weightIndex = node.numShards(index) - balancer.avgShardsPerNode(index);
return theta0 * weightShard + theta1 * weightIndex;
}The resulting formula is:
ShardWeightFactor × (current node shard count − ideal average) + IndexWeightFactor × (current node index shard count − ideal index average)
Related Configuration
cluster.routing.allocation.balance.shard: 0.45f (shard weight factor)
cluster.routing.allocation.balance.index: 0.55f (index weight factor)
cluster.routing.allocation.balance.threshold: 1.0f (rebalance trigger threshold)
cluster.routing.allocation.total_shards_per_node: -1 (no limit)
cluster.routing.allocation.cluster_concurrent_rebalance: 2 (max concurrent shard moves)
cluster.routing.allocation.node_concurrent_recoveries: 2 (per‑node concurrent moves)
Rebalance Enable Setting
public Decision canRebalance(ShardRouting shardRouting, RoutingAllocation allocation) {
if (allocation.ignoreDisable()) {
return allocation.decision(Decision.YES, NAME, "allocation is explicitly ignoring any disabling of rebalancing");
}
Settings indexSettings = allocation.metadata().getIndexSafe(shardRouting.index()).getSettings();
final Rebalance enable;
final boolean usedIndexSetting;
if (INDEX_ROUTING_REBALANCE_ENABLE_SETTING.exists(indexSettings)) {
enable = INDEX_ROUTING_REBALANCE_ENABLE_SETTING.get(indexSettings);
usedIndexSetting = true;
} else {
enable = this.enableRebalance;
usedIndexSetting = false;
}
switch (enable) {
case ALL:
return allocation.decision(Decision.YES, NAME, "all rebalancing is allowed");
case NONE:
return allocation.decision(Decision.NO, NAME, "no rebalancing is allowed due to %s", setting(enable, usedIndexSetting));
case PRIMARIES:
if (shardRouting.primary()) {
return allocation.decision(Decision.YES, NAME, "primary rebalancing is allowed");
} else {
return allocation.decision(Decision.NO, NAME, "replica rebalancing is forbidden due to %s", setting(enable, usedIndexSetting));
}
case REPLICAS:
if (!shardRouting.primary()) {
return allocation.decision(Decision.YES, NAME, "replica rebalancing is allowed");
} else {
return allocation.decision(Decision.NO, NAME, "primary rebalancing is forbidden due to %s", setting(enable, usedIndexSetting));
}
default:
throw new IllegalStateException("Unknown rebalance option");
}
}Disk Watermark Settings
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: "85%" (nodes above this are avoided)
cluster.routing.allocation.disk.watermark.high: "90%" (nodes above this trigger shard relocation)
public Decision canAllocate(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) {
...
if (freeDiskPercentage < diskThresholdSettings.getFreeDiskThresholdLow()) {
return allocation.decision(Decision.NO, NAME, "the node is above the low watermark ...");
}
if (freeBytes < diskThresholdSettings.getFreeBytesThresholdHigh().getBytes()) {
return allocation.decision(Decision.NO, NAME, "the shard cannot remain on this node because it is above the high watermark ...");
}
...
}Trigger Conditions
Even when the weight formula is satisfied, rebalance occurs only if the rebalance enable setting permits it (e.g., cluster.routing.rebalance.enable: all ) and the node’s disk usage is below the configured watermarks.
How to Improve
Adjust Shard Count
Ensure the total number of primary shards is a multiple of the number of data nodes, allowing the built‑in balancer to distribute shards evenly.
Custom Balancing Tool
Design a tool that balances based on actual load metrics (CPU, I/O, network, RAM) rather than shard count alone. Define a load index for each node:
Ei = o × IOi + d × MBPSi + c × CPUi + r × RAMiWhen a node’s load exceeds another by more than 50 % and its load is above 50 %, migrate a high‑load shard from the hot node to the cooler node, preferably during low‑traffic windows.
Future Outlook
Intelligent Balancing Strategies
Future tools may incorporate advanced algorithms to evaluate node load more accurately and decide shard movements.
Real‑time Monitoring and Feedback
More frequent monitoring, flexible alerts, and rapid response mechanisms will enable quicker detection of imbalance.
Dynamic Migration Windows
Automatically select low‑traffic periods for shard relocation to minimize impact on production workloads.
Customizable Policies
Allow users to tune thresholds, specify per‑index rules, and enable or disable automatic moves for critical indices.
Conclusion
Elasticsearch’s default shard balancer only equalizes shard counts, ignoring data size, hotness, and hardware differences, which can lead to performance degradation. Effective load balancing must consider actual resource usage, prioritize hot indices, and minimize migration overhead.
References
Elasticsearch source code.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.