Big Data 8 min read

Hot‑Warm Architecture in Elasticsearch 5.x: Node Types, Index Allocation and Curator Automation

The article explains how to design a time‑based Elasticsearch cluster using a hot‑warm architecture with dedicated master, hot, and warm nodes, shows how to configure node attributes, allocate indices via settings or Curator, and discusses best‑practice compression and rollover strategies for large‑scale log data.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Hot‑Warm Architecture in Elasticsearch 5.x: Node Types, Index Allocation and Curator Automation

When using Elasticsearch for large time‑series data analysis, it is recommended to employ a time‑based index and a hierarchical Hot‑Warm architecture consisting of three node types—master, hot, and warm (cold) nodes.

Master Nodes

Run three dedicated master nodes ( master nodes) for maximum resilience and set discovery.zen.minimum_master_nodes to 2 to avoid split‑brain scenarios. Master nodes handle only cluster management and state, so they can be provisioned with modest CPU, RAM, and disk resources.

Hot Nodes

Hot nodes store the most recent indices, which are queried most frequently. Because indexing is CPU‑ and I/O‑intensive, hot nodes should be equipped with powerful CPUs and SSD storage. At least three hot nodes ( hot node) are recommended for high availability.

Warm (Cold) Nodes

Warm nodes (also called cold nodes) hold large, read‑only indices that are queried infrequently. They typically use high‑capacity spinning disks instead of SSDs. As with hot nodes, a minimum of three warm nodes is advised, and additional nodes may be needed for performance.

Elasticsearch determines which servers belong to hot or warm nodes by assigning an arbitrary attribute in elasticsearch.yml, e.g., node.attr.box_type: hot for hot nodes or node.attr.box_type: warm for warm nodes, and starting the node with ./bin/elasticsearch -Enode.attr.box_type=hot (or warm).

Indices can be forced onto hot nodes with a setting such as:

PUT /logs_2016-12-26
{
  "settings": {
    "index.routing.allocation.require.box_type": "hot"
  }
}

After a few days, the same index can be moved to warm nodes by updating the setting to "warm":

PUT /logs_2016-12-26/_settings
{
  "settings": {
    "index.routing.allocation.require.box_type": "warm"
  }
}

If you manage index templates with Logstash or Beats, include the allocation filter in the template, e.g.:

{
  "template" : "indexname-*",
  "version" : 50001,
  "settings" : {
    "index.routing.allocation.require.box_type": "hot"
  }
  ...
}

When an index is no longer written to or frequently searched, it can be migrated from hot to warm nodes by changing the same setting.

For better compression on warm nodes, set index.codec: best_compression in elasticsearch.yml and optionally run the _forcemerge API with best_compression codec.

Automation of these moves can be achieved with the Curator tool. An example Curator 4.2 configuration to move indices older than three days from hot to warm nodes:

actions:
  1:
    action: allocation
    description: "Apply shard allocation filtering rules to the specified indices"
    options:
      key: box_type
      value: warm
      allocation_type: require
      wait_for_completion: true
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 3

  2:
    action: forcemerge
    description: "Perform a forceMerge on selected indices to 'max_num_segments' per shard"
    options:
      max_num_segments: 1
      timeout_override: 21600
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 3

Note that timeout_override defaults to 21600 seconds but can be adjusted.

Since Elasticsearch 5.0, the Rollover and shrink APIs provide a simpler way to manage time‑based indices and reduce shard counts.

Original source: Elastic blog

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataElasticsearchcuratorHot‑Warm ArchitectureIndex Allocation
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.