Hot‑Warm Architecture in Elasticsearch 5.x: Node Types, Index Allocation and Curator Automation
The article explains how to design a time‑based Elasticsearch cluster using a hot‑warm architecture with dedicated master, hot, and warm nodes, shows how to configure node attributes, allocate indices via settings or Curator, and discusses best‑practice compression and rollover strategies for large‑scale log data.
When using Elasticsearch for large time‑series data analysis, it is recommended to employ a time‑based index and a hierarchical Hot‑Warm architecture consisting of three node types—master, hot, and warm (cold) nodes.
Master Nodes
Run three dedicated master nodes ( master nodes) for maximum resilience and set discovery.zen.minimum_master_nodes to 2 to avoid split‑brain scenarios. Master nodes handle only cluster management and state, so they can be provisioned with modest CPU, RAM, and disk resources.
Hot Nodes
Hot nodes store the most recent indices, which are queried most frequently. Because indexing is CPU‑ and I/O‑intensive, hot nodes should be equipped with powerful CPUs and SSD storage. At least three hot nodes ( hot node) are recommended for high availability.
Warm (Cold) Nodes
Warm nodes (also called cold nodes) hold large, read‑only indices that are queried infrequently. They typically use high‑capacity spinning disks instead of SSDs. As with hot nodes, a minimum of three warm nodes is advised, and additional nodes may be needed for performance.
Elasticsearch determines which servers belong to hot or warm nodes by assigning an arbitrary attribute in elasticsearch.yml, e.g., node.attr.box_type: hot for hot nodes or node.attr.box_type: warm for warm nodes, and starting the node with ./bin/elasticsearch -Enode.attr.box_type=hot (or warm).
Indices can be forced onto hot nodes with a setting such as:
PUT /logs_2016-12-26
{
"settings": {
"index.routing.allocation.require.box_type": "hot"
}
}After a few days, the same index can be moved to warm nodes by updating the setting to "warm":
PUT /logs_2016-12-26/_settings
{
"settings": {
"index.routing.allocation.require.box_type": "warm"
}
}If you manage index templates with Logstash or Beats, include the allocation filter in the template, e.g.:
{
"template" : "indexname-*",
"version" : 50001,
"settings" : {
"index.routing.allocation.require.box_type": "hot"
}
...
}When an index is no longer written to or frequently searched, it can be migrated from hot to warm nodes by changing the same setting.
For better compression on warm nodes, set index.codec: best_compression in elasticsearch.yml and optionally run the _forcemerge API with best_compression codec.
Automation of these moves can be achieved with the Curator tool. An example Curator 4.2 configuration to move indices older than three days from hot to warm nodes:
actions:
1:
action: allocation
description: "Apply shard allocation filtering rules to the specified indices"
options:
key: box_type
value: warm
allocation_type: require
wait_for_completion: true
filters:
- filtertype: pattern
kind: prefix
value: logstash-
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 3
2:
action: forcemerge
description: "Perform a forceMerge on selected indices to 'max_num_segments' per shard"
options:
max_num_segments: 1
timeout_override: 21600
filters:
- filtertype: pattern
kind: prefix
value: logstash-
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 3Note that timeout_override defaults to 21600 seconds but can be adjusted.
Since Elasticsearch 5.0, the Rollover and shrink APIs provide a simpler way to manage time‑based indices and reduce shard counts.
Original source: Elastic blog
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
