Operations 12 min read

Comprehensive EFLK (Elasticsearch, Filebeat, Logstash, Kibana) Deep Inspection and Monitoring Guide

This comprehensive guide details a step‑by‑step deep‑inspection and monitoring strategy for an Elasticsearch‑Filebeat‑Logstash‑Kibana (EFLK) stack, covering cluster health, node and shard metrics, index status, query profiling, Filebeat, Logstash and Kibana validation, DSL query examples, and a Python script for automated metric collection.

Sohu Tech Products

Oct 30, 2024

Comprehensive EFLK (Elasticsearch, Filebeat, Logstash, Kibana) Deep Inspection and Monitoring Guide

Ensuring the stable operation of an Elasticsearch‑Filebeat‑Logstash‑Kibana (EFLK) stack is critical for both operations and big‑data environments. This guide presents a thorough deep‑inspection plan covering health checks, performance metrics, shard status, index health, and query profiling for each component.

1. Elasticsearch Deep Inspection

1.1 Cluster Health Check

Use the GET _cluster/health API to retrieve overall cluster health. Key fields to monitor are status (green/yellow/red), number_of_nodes, active_primary_shards, active_shards, and unassigned_shards.

1.2 Node Performance Monitoring

Query node statistics with GET _nodes/stats. Important metrics include indices.docs.count, indices.store.size_in_bytes, jvm.mem.heap_used_percent, os.cpu.percent, and fs.total.available_in_bytes.

1.3 Shard Status Monitoring

List shard details via

GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason

. Pay special attention to the unassigned.reason field; if shards are unassigned, reallocate them with:

POST /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "your-index",
        "shard": 0,
        "node": "node-name",
        "accept_data_loss": true
      }
    }
  ]
}

1.4 Index Status Inspection

Check index health using

GET _cat/indices?v&h=index,health,status,pri,rep,docs.count,store.size

. Monitor health, status, pri, rep, docs.count, and store.size.

1.5 Cluster Performance Analysis (Profile Query)

Enable the profile parameter in search requests to obtain per‑phase execution times, e.g.:

GET /your-index/_search?pretty
{
  "profile": true,
  "query": { "match": { "field": "value" } }
}

The response highlights the most time‑consuming stages, helping to pinpoint bottlenecks.

2. Filebeat Inspection

2.1 Configuration Check

Verify that Filebeat is installed and running ( systemctl status filebeat) and that /etc/filebeat/filebeat.yml contains correct input sources and output destinations (Elasticsearch or Logstash).

2.2 Log Examination

Tail the Filebeat log ( tail -f /var/log/filebeat/filebeat) to ensure no connection errors or permission issues.

2.3 Configuration Test

Validate the config with filebeat test config and run Filebeat in foreground for debugging ( filebeat -e).

3. Logstash Inspection

3.1 Process Check

Confirm Logstash is active via systemctl status logstash.

3.2 Pipeline Configuration Review

Inspect pipeline files under /etc/logstash/conf.d/ to ensure proper input, filter, and output sections.

3.3 Log Review

Tail Logstash logs ( tail -f /var/log/logstash/logstash-plain.log) and look for connection failures or Grok parsing errors.

4. Kibana Inspection

4.1 Process Status

Check Kibana service health with systemctl status kibana.

4.2 Configuration Validation

Review /config/kibana.yml, ensuring elasticsearch.hosts points to the cluster and server.host is set to 0.0.0.0 for external access.

4.3 Log Examination

Inspect Kibana logs ( tail -f logs/kibana.log) for connection issues or startup failures.

4.4 UI Verification

Log into Kibana and confirm that Discover, Dashboards, and Visualizations load correctly.

5. DSL Query Examples

5.1 Slow‑Query Log

Find slow queries in the last day:

GET /_search
{
  "query": { "range": { "@timestamp": { "gte": "now-1d/d", "lt": "now/d" } } },
  "sort": [ { "took": { "order": "desc" } } ],
  "size": 10
}

5.2 Error Log Search

Search for error messages across Filebeat indices:

GET /filebeat-*/_search
{
  "query": { "match": { "message": "error" } }
}

5.3 Node‑Specific Logs

Retrieve logs for a particular node:

GET /_search
{
  "query": { "term": { "host.name": { "value": "node-1" } } }
}

6. Enterprise Automation: Python Metrics Collector

A Python script (shown below) uses the elasticsearch client to gather cluster health, node CPU, load average, memory, JVM heap, and disk usage, then writes the data as JSON to a log file.

import json
from datetime import datetime
import configparser
import warnings
from elasticsearch import Elasticsearch

warnings.filterwarnings("ignore")

def init_es_client(config_path='./conf/config.ini'):
    """Initialize and return an Elasticsearch client"""
    cfg = configparser.ConfigParser()
    cfg.read(config_path)
    es_host = cfg.get('elasticsearch', 'ES_HOST')
    es_user = cfg.get('elasticsearch', 'ES_USER')
    es_password = cfg.get('elasticsearch', 'ES_PASSWORD')
    return Elasticsearch(hosts=[es_host], basic_auth=(es_user, es_password), verify_certs=False, ca_certs='conf/http_ca.crt')

LOG_FILE = 'elasticsearch_metrics.log'
es = init_es_client()

def get_cluster_health():
    return es.cluster.health().body

def get_node_stats():
    return es.nodes.stats().body

def get_cluster_metrics():
    metrics = {}
    health = get_cluster_health()
    metrics['cluster_health'] = health
    node_stats = get_node_stats().get('nodes', {})
    metrics['nodes'] = {}
    for nid, info in node_stats.items():
        name = info.get('name')
        metrics['nodes'][name] = {
            'cpu_usage': info['os']['cpu']['percent'],
            'load_average': info['os']['cpu'].get('load_average', {}).get('1m'),
            'memory_used': info['os']['mem']['used_percent'],
            'heap_used': info['jvm']['mem']['heap_used_percent'],
            'disk_available': info['fs']['total']['available_in_bytes'] / (1024**3),
            'disk_total': info['fs']['total']['total_in_bytes'] / (1024**3),
            'disk_usage_percent': 100 - (info['fs']['total']['available_in_bytes'] * 100 / info['fs']['total']['total_in_bytes'])
        }
    return metrics

def log_metrics():
    metrics = get_cluster_metrics()
    ts = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    with open(LOG_FILE, 'a') as f:
        f.write(f"Timestamp: {ts}
")
        f.write(json.dumps(metrics, indent=4))
        f.write('

')

if __name__ == "__main__":
    log_metrics()
    print("Elasticsearch cluster metrics logged successfully.")

The script can be scheduled with cron to run daily at 06:00:

0 6 * * * /usr/bin/python3 /home/user/scripts/es_metrics.py >> /home/user/scripts/es_metrics_cron.log 2>&1

7. Conclusion

By following this deep‑inspection framework and automating metric collection, teams can maintain full visibility into the health and performance of the EFLK stack, quickly detect issues, and ensure reliable operation. Integrating monitoring tools such as Prometheus, Zabbix, or Grafana further enhances observability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Elasticsearch Logstash Kibana EFLK filebeat

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.