Operations 12 min read

Comprehensive EFLK (Elasticsearch, Filebeat, Logstash, Kibana) Deep Inspection and Monitoring Guide

This comprehensive guide details a step‑by‑step deep‑inspection and monitoring strategy for an Elasticsearch‑Filebeat‑Logstash‑Kibana (EFLK) stack, covering cluster health, node and shard metrics, index status, query profiling, Filebeat, Logstash and Kibana validation, DSL query examples, and a Python script for automated metric collection.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Comprehensive EFLK (Elasticsearch, Filebeat, Logstash, Kibana) Deep Inspection and Monitoring Guide

Ensuring the stable operation of an Elasticsearch‑Filebeat‑Logstash‑Kibana (EFLK) stack is critical for both operations and big‑data environments. This guide presents a thorough deep‑inspection plan covering health checks, performance metrics, shard status, index health, and query profiling for each component.

1. Elasticsearch Deep Inspection

1.1 Cluster Health Check

Use the GET _cluster/health API to retrieve overall cluster health. Key fields to monitor are status (green/yellow/red), number_of_nodes , active_primary_shards , active_shards , and unassigned_shards .

1.2 Node Performance Monitoring

Query node statistics with GET _nodes/stats . Important metrics include indices.docs.count , indices.store.size_in_bytes , jvm.mem.heap_used_percent , os.cpu.percent , and fs.total.available_in_bytes .

1.3 Shard Status Monitoring

List shard details via GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason . Pay special attention to the unassigned.reason field; if shards are unassigned, reallocate them with:

POST /_cluster/reroute { "commands": [ { "allocate_stale_primary": { "index": "your-index", "shard": 0, "node": "node-name", "accept_data_loss": true } } ] }

1.4 Index Status Inspection

Check index health using GET _cat/indices?v&h=index,health,status,pri,rep,docs.count,store.size . Monitor health , status , pri , rep , docs.count , and store.size .

1.5 Cluster Performance Analysis (Profile Query)

Enable the profile parameter in search requests to obtain per‑phase execution times, e.g.:

GET /your-index/_search?pretty { "profile": true, "query": { "match": { "field": "value" } } }

The response highlights the most time‑consuming stages, helping to pinpoint bottlenecks.

2. Filebeat Inspection

2.1 Configuration Check

Verify that Filebeat is installed and running ( systemctl status filebeat ) and that /etc/filebeat/filebeat.yml contains correct input sources and output destinations (Elasticsearch or Logstash).

2.2 Log Examination

Tail the Filebeat log ( tail -f /var/log/filebeat/filebeat ) to ensure no connection errors or permission issues.

2.3 Configuration Test

Validate the config with filebeat test config and run Filebeat in foreground for debugging ( filebeat -e ).

3. Logstash Inspection

3.1 Process Check

Confirm Logstash is active via systemctl status logstash .

3.2 Pipeline Configuration Review

Inspect pipeline files under /etc/logstash/conf.d/ to ensure proper input, filter, and output sections.

3.3 Log Review

Tail Logstash logs ( tail -f /var/log/logstash/logstash-plain.log ) and look for connection failures or Grok parsing errors.

4. Kibana Inspection

4.1 Process Status

Check Kibana service health with systemctl status kibana .

4.2 Configuration Validation

Review /config/kibana.yml , ensuring elasticsearch.hosts points to the cluster and server.host is set to 0.0.0.0 for external access.

4.3 Log Examination

Inspect Kibana logs ( tail -f logs/kibana.log ) for connection issues or startup failures.

4.4 UI Verification

Log into Kibana and confirm that Discover, Dashboards, and Visualizations load correctly.

5. DSL Query Examples

5.1 Slow‑Query Log

Find slow queries in the last day:

GET /_search { "query": { "range": { "@timestamp": { "gte": "now-1d/d", "lt": "now/d" } } }, "sort": [ { "took": { "order": "desc" } } ], "size": 10 }

5.2 Error Log Search

Search for error messages across Filebeat indices:

GET /filebeat-*/_search { "query": { "match": { "message": "error" } } }

5.3 Node‑Specific Logs

Retrieve logs for a particular node:

GET /_search { "query": { "term": { "host.name": { "value": "node-1" } } } }

6. Enterprise Automation: Python Metrics Collector

A Python script (shown below) uses the elasticsearch client to gather cluster health, node CPU, load average, memory, JVM heap, and disk usage, then writes the data as JSON to a log file.

import json from datetime import datetime import configparser import warnings from elasticsearch import Elasticsearch warnings.filterwarnings("ignore") def init_es_client(config_path='./conf/config.ini'): """Initialize and return an Elasticsearch client""" cfg = configparser.ConfigParser() cfg.read(config_path) es_host = cfg.get('elasticsearch', 'ES_HOST') es_user = cfg.get('elasticsearch', 'ES_USER') es_password = cfg.get('elasticsearch', 'ES_PASSWORD') return Elasticsearch(hosts=[es_host], basic_auth=(es_user, es_password), verify_certs=False, ca_certs='conf/http_ca.crt') LOG_FILE = 'elasticsearch_metrics.log' es = init_es_client() def get_cluster_health(): return es.cluster.health().body def get_node_stats(): return es.nodes.stats().body def get_cluster_metrics(): metrics = {} health = get_cluster_health() metrics['cluster_health'] = health node_stats = get_node_stats().get('nodes', {}) metrics['nodes'] = {} for nid, info in node_stats.items(): name = info.get('name') metrics['nodes'][name] = { 'cpu_usage': info['os']['cpu']['percent'], 'load_average': info['os']['cpu'].get('load_average', {}).get('1m'), 'memory_used': info['os']['mem']['used_percent'], 'heap_used': info['jvm']['mem']['heap_used_percent'], 'disk_available': info['fs']['total']['available_in_bytes'] / (1024**3), 'disk_total': info['fs']['total']['total_in_bytes'] / (1024**3), 'disk_usage_percent': 100 - (info['fs']['total']['available_in_bytes'] * 100 / info['fs']['total']['total_in_bytes']) } return metrics def log_metrics(): metrics = get_cluster_metrics() ts = datetime.now().strftime('%Y-%m-%d %H:%M:%S') with open(LOG_FILE, 'a') as f: f.write(f"Timestamp: {ts}\n") f.write(json.dumps(metrics, indent=4)) f.write('\n\n') if __name__ == "__main__": log_metrics() print("Elasticsearch cluster metrics logged successfully.")

The script can be scheduled with cron to run daily at 06:00:

0 6 * * * /usr/bin/python3 /home/user/scripts/es_metrics.py >> /home/user/scripts/es_metrics_cron.log 2>&1

7. Conclusion

By following this deep‑inspection framework and automating metric collection, teams can maintain full visibility into the health and performance of the EFLK stack, quickly detect issues, and ensure reliable operation. Integrating monitoring tools such as Prometheus, Zabbix, or Grafana further enhances observability.

MonitoringPythonautomationElasticsearchLogstashKibanaEFLKFilebeat
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.