Operations 16 min read

Mastering ELK Stack: From Installation to Advanced Sharding Strategies

This guide introduces the ELK stack fundamentals, explains Elasticsearch, Logstash, and Kibana roles, walks through environment preparation, installation, configuration, head plugin setup, shard and replica concepts, scaling recommendations, and provides scripts for monitoring cluster health, offering a comprehensive hands‑on reference for log analytics operations.

Raymond Ops

Aug 22, 2025

Mastering ELK Stack: From Installation to Advanced Sharding Strategies

ELK Stack Overview

ELK is a combination of three open‑source projects—Elasticsearch, Logstash, and Kibana—used for real‑time full‑text search, log collection, and visualisation. Its main advantages include flexible processing, simple JSON‑based configuration, high‑performance retrieval, linear cluster scaling, and an attractive front‑end UI.

Log Collection Software

ELKstack

Flume

日志易

Component Definitions

Elasticsearch : a highly scalable open‑source search and analytics engine that stores logs, provides distributed high‑availability, and offers APIs for large‑scale log data such as Nginx, Tomcat, and system logs.

Logstash (Filebeat) : collects and forwards logs, supports plugins for filtering, and can parse plain or custom JSON logs.

Kibana : visualises data from Elasticsearch via a web UI, enabling graphical log dashboards.

Deploying Elasticsearch

Prepare two hosts (ELKstack01 and ELKstack02) with the following roles and IPs:

Hostname

External IP

Internal IP

Role

Applications

ELKstack01

10.0.0.81

172.16.1.81

ES log storage

JDK, elasticsearch

ELKstack02

10.0.0.82

172.16.1.82

ES log storage

JDK, elasticsearch

# 1. Replace Elasticsearch repo
vim /etc/yum.repos.d/es.repo
[elasticsearch-5.x]
name=Elasticsearch repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

# 2. Install Elasticsearch
yum install -y elasticsearch

# 3. Edit configuration
vim /etc/elasticsearch/elasticsearch.yml
cluster.name: elkstack
node.name: es02
path.data: /data/es/data
path.logs: /data/es/logs
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
network.host: 0.0.0.0
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.0.0.81","10.0.0.82"]

# 4. Adjust systemd service
vim /usr/lib/systemd/system/elasticsearch.service
LimitMEMLOCK=infinity

# 5. Create data and log directories
mkdir -p /data/es/{logs,data}

# 6. Set ownership
chown -R elasticsearch.elasticsearch /data/

# 7. Increase file descriptor limits
vim /etc/security/limits.conf
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 131072
* hard nofile 131072

# 8. Set JVM heap
vim /etc/elasticsearch/jvm.options
-Xms1g
-Xmx1g

Access Elasticsearch via http://10.0.0.82:9200/.

Installing the Head Plugin

# Install npm
yum install -y npm
# Clone the repository
git clone https://github.com/mobz/elasticsearch-head.git
# Unzip if needed
unzip elasticsearch-head-master.zip
# Install Grunt
npm install grunt -save
# Start the front‑end
npm run start &

If an error occurs, install bzip2 ( yum install -y bzip2).

Open the plugin UI at http://10.0.0.81:9100/.

Replica Shards

Replica shards provide fault tolerance; if a primary shard fails, a replica is promoted. They also serve read requests, improving query performance, but increase hardware requirements. Adding replicas does not increase index capacity.

Elasticsearch Working Mechanism

Elasticsearch uses an inverted index where each term maps to the documents containing it, enabling full‑text search. The index stores term frequencies, document lengths, and other statistics used for scoring.

Segments

Because inverted indexes are immutable, Lucene splits them into small immutable segments . New documents are first cached in memory, then periodically committed as new segments. A commit point tracks all available segments for searching.

Shard and Replica Configuration

When creating an index, consider the number of primary shards (default 5) and replicas (default 1). Each shard should not exceed ~30 GB to stay within JVM heap limits (30‑32 GB). For a 200 GB dataset, 7‑8 shards are recommended.

Do not over‑allocate shards for future growth; add nodes when needed. Keep the number of replicas moderate and adjust as the cluster evolves.

Cluster Health Monitoring

# Python script to check cluster health
import subprocess, json
cluster_ip = "10.0.0.81"
result = subprocess.check_output(f"curl -s http://{cluster_ip}:9200/_cluster/health?pretty=true", shell=True)
status = json.loads(result).get("status")
if status == "green":
    print("\033[1;32m 0 \033[0m")
elif status == "yellow":
    print("\033[1;33m 1 \033[0m")
else:
    print("\033[1;31m 2 \033[0m")
# Direct API call
curl -s -XGET http://10.0.0.81:9200/_cluster/health?pretty=true

Cluster health colors: green (healthy), yellow (missing replicas), red (missing primary and replica shards).

Practical Sharding Example

# Update replica settings via template
curl -XPUT -d '{
  "template": "*",
  "settings": {
    "index": {
      "number_of_shards": 6,
      "number_of_replicas": 1
    }
  }
}' http://10.0.0.81:9200/_template/my_template

Adjust shard count based on node count (e.g., 1.5‑3× nodes) and keep each shard under 30 GB. Use SSDs and multi‑core CPUs for better performance when many shards are queried.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations Elasticsearch sharding ELK Kibana

Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.