Operations 13 min read

Master EFLK: Deep‑Inspection Guide for Elasticsearch, Filebeat, Logstash & Kibana

This guide presents a comprehensive, step‑by‑step deep‑inspection plan for the EFLK stack, covering Elasticsearch health checks, node performance metrics, shard and index monitoring, Logstash and Kibana validation, DSL query examples, and automated Python‑based metric collection with cron scheduling.

ITPUB
ITPUB
ITPUB
Master EFLK: Deep‑Inspection Guide for Elasticsearch, Filebeat, Logstash & Kibana

1. Elasticsearch Deep Inspection

1.1 Cluster Health Check

Use GET _cluster/health to retrieve the overall health of the cluster. Key fields to monitor are status (green, yellow, red), number_of_nodes, active_primary_shards, active_shards, and unassigned_shards.

1.2 Node Performance Monitoring

Query GET _nodes/stats and focus on the following metrics: indices.docs.count (document count), indices.store.size_in_bytes (index size), jvm.mem.heap_used_percent (JVM heap usage), os.cpu.percent (CPU usage), and fs.total.available_in_bytes (available disk space).

1.3 Shard State Monitoring

Run

GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason

to list all shard details. Pay special attention to the unassigned.reason field; if unassigned shards appear, reallocate them with the following command:

POST /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "your-index",
        "shard": 0,
        "node": "node-name",
        "accept_data_loss": true
      }
    }
  ]
}

1.4 Index Status Inspection

Execute

GET _cat/indices?v&h=index,health,status,pri,rep,docs.count,store.size

and examine the fields health, status, pri (primary shards), rep (replicas), docs.count, and store.size to assess index health and size.

1.5 Cluster Performance Profiling

Enable the profile:true parameter in a search request to obtain per‑phase execution times, helping to pinpoint performance bottlenecks.

GET /your-index/_search?pretty
{
  "profile": true,
  "query": {
    "match": {
      "field": "value"
    }
  }
}

2. Filebeat Inspection

2.1 Configuration Check

Verify that Filebeat is installed and running with systemctl status filebeat. Review /etc/filebeat/filebeat.yml to ensure correct input sources (log files, system logs) and proper output destinations (Elasticsearch or Logstash).

2.2 Log Review

Tail the log file /var/log/filebeat/filebeat to detect errors such as connection failures or permission problems.

tail -f /var/log/filebeat/filebeat

2.3 Config Test

Validate the configuration with filebeat test config and start Filebeat in debug mode using filebeat -e to view detailed logs.

filebeat test config
filebeat -e

3. Logstash Inspection

3.1 Process Check

Confirm Logstash is active via systemctl status logstash.

3.2 Pipeline Configuration

Inspect pipeline files under /etc/logstash/conf.d/. Ensure the three core sections are correctly defined: input (e.g., Filebeat, Kafka), filter (e.g., Grok, date), and output (e.g., Elasticsearch, file).

3.3 Log Review

Tail /var/log/logstash/logstash-plain.log and look for connection failures to Elasticsearch or Grok parsing errors.

tail -f /var/log/logstash/logstash-plain.log

4. Kibana Inspection

4.1 Process Status

Check Kibana service health with systemctl status kibana.

4.2 Configuration Review

Open /config/kibana.yml and verify critical settings such as elasticsearch.hosts: ["http://localhost:9200"] and server.host: 0.0.0.0 to allow external access.

4.3 Log Review

Tail /logs/kibana.log for connection errors or startup failures.

tail -f /logs/kibana.log

4.4 UI Checks

Log into Kibana and confirm that the Discover, Dashboards, and Visualizations sections load and function correctly.

5. DSL Query Examples

5.1 Slow Query Log

GET /_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  },
  "sort": [{ "took": { "order": "desc" } }],
  "size": 10
}

5.2 Error Log

GET /filebeat-*/_search
{
  "query": {
    "match": {
      "message": "error"
    }
  }
}

5.3 Node‑Specific Issues

GET /_search
{
  "query": {
    "term": {
      "host.name": {
        "value": "node-1"
      }
    }
  }
}

6. Enterprise Automation with Python

6.1 Metric Collection Script

The following Python script (requires the requests or official elasticsearch client) gathers cluster health and node statistics, computes CPU usage, load average, memory and heap usage, disk availability, and writes the results as JSON to elasticsearch_metrics.log.

import json
from datetime import datetime
import configparser
import warnings
from elasticsearch import Elasticsearch

warnings.filterwarnings("ignore")

def init_es_client(config_path='./conf/config.ini'):
    """Initialize and return an Elasticsearch client."""
    config = configparser.ConfigParser()
    config.read(config_path)
    es_host = config.get('elasticsearch', 'ES_HOST')
    es_user = config.get('elasticsearch', 'ES_USER')
    es_password = config.get('elasticsearch', 'ES_PASSWORD')
    es = Elasticsearch(
        hosts=[es_host],
        basic_auth=(es_user, es_password),
        verify_certs=False,
        ca_certs='conf/http_ca.crt'
    )
    return es

LOG_FILE = 'elasticsearch_metrics.log'
es = init_es_client()

def get_cluster_health():
    return es.cluster.health().body

def get_node_stats():
    return es.nodes.stats().body

def get_cluster_metrics():
    metrics = {}
    metrics['cluster_health'] = get_cluster_health()
    node_stats = get_node_stats().get('nodes', {})
    metrics['nodes'] = {}
    for node_id, info in node_stats.items():
        name = info.get('name')
        metrics['nodes'][name] = {
            'cpu_usage': info['os']['cpu']['percent'],
            'load_average': info['os']['cpu'].get('load_average', {}).get('1m'),
            'memory_used': info['os']['mem']['used_percent'],
            'heap_used': info['jvm']['mem']['heap_used_percent'],
            'disk_available': info['fs']['total']['available_in_bytes'] / (1024**3),
            'disk_total': info['fs']['total']['total_in_bytes'] / (1024**3),
            'disk_usage_percent': 100 - (
                info['fs']['total']['available_in_bytes'] * 100 /
                info['fs']['total']['total_in_bytes']
            )
        }
    return metrics

def log_metrics():
    metrics = get_cluster_metrics()
    timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    with open(LOG_FILE, 'a') as f:
        f.write(f"Timestamp: {timestamp}
")
        f.write(json.dumps(metrics, indent=4))
        f.write('

')

if __name__ == "__main__":
    log_metrics()
    print("Elasticsearch cluster metrics logged successfully.")

6.2 Cron Scheduling

Schedule the script to run daily at 06:00 using a crontab entry:

0 6 * * * /usr/bin/python3 /home/user/scripts/es_metrics.py >> /home/user/scripts/es_metrics_cron.log 2>&1

7. Conclusion

Applying the detailed inspection steps and automating metric collection enables early detection of health or performance problems across the EFLK stack, ensuring stable operation. Complementary observability tools such as Prometheus, Zabbix or Grafana can further enhance monitoring coverage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonElasticsearchLogstashKibana
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.