Master EFLK: Deep‑Inspection Guide for Elasticsearch, Filebeat, Logstash & Kibana
This guide presents a comprehensive, step‑by‑step deep‑inspection plan for the EFLK stack, covering Elasticsearch health checks, node performance metrics, shard and index monitoring, Logstash and Kibana validation, DSL query examples, and automated Python‑based metric collection with cron scheduling.
1. Elasticsearch Deep Inspection
1.1 Cluster Health Check
Use GET _cluster/health to retrieve the overall health of the cluster. Key fields to monitor are status (green, yellow, red), number_of_nodes, active_primary_shards, active_shards, and unassigned_shards.
1.2 Node Performance Monitoring
Query GET _nodes/stats and focus on the following metrics: indices.docs.count (document count), indices.store.size_in_bytes (index size), jvm.mem.heap_used_percent (JVM heap usage), os.cpu.percent (CPU usage), and fs.total.available_in_bytes (available disk space).
1.3 Shard State Monitoring
Run
GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reasonto list all shard details. Pay special attention to the unassigned.reason field; if unassigned shards appear, reallocate them with the following command:
POST /_cluster/reroute
{
"commands": [
{
"allocate_stale_primary": {
"index": "your-index",
"shard": 0,
"node": "node-name",
"accept_data_loss": true
}
}
]
}1.4 Index Status Inspection
Execute
GET _cat/indices?v&h=index,health,status,pri,rep,docs.count,store.sizeand examine the fields health, status, pri (primary shards), rep (replicas), docs.count, and store.size to assess index health and size.
1.5 Cluster Performance Profiling
Enable the profile:true parameter in a search request to obtain per‑phase execution times, helping to pinpoint performance bottlenecks.
GET /your-index/_search?pretty
{
"profile": true,
"query": {
"match": {
"field": "value"
}
}
}2. Filebeat Inspection
2.1 Configuration Check
Verify that Filebeat is installed and running with systemctl status filebeat. Review /etc/filebeat/filebeat.yml to ensure correct input sources (log files, system logs) and proper output destinations (Elasticsearch or Logstash).
2.2 Log Review
Tail the log file /var/log/filebeat/filebeat to detect errors such as connection failures or permission problems.
tail -f /var/log/filebeat/filebeat2.3 Config Test
Validate the configuration with filebeat test config and start Filebeat in debug mode using filebeat -e to view detailed logs.
filebeat test config
filebeat -e3. Logstash Inspection
3.1 Process Check
Confirm Logstash is active via systemctl status logstash.
3.2 Pipeline Configuration
Inspect pipeline files under /etc/logstash/conf.d/. Ensure the three core sections are correctly defined: input (e.g., Filebeat, Kafka), filter (e.g., Grok, date), and output (e.g., Elasticsearch, file).
3.3 Log Review
Tail /var/log/logstash/logstash-plain.log and look for connection failures to Elasticsearch or Grok parsing errors.
tail -f /var/log/logstash/logstash-plain.log4. Kibana Inspection
4.1 Process Status
Check Kibana service health with systemctl status kibana.
4.2 Configuration Review
Open /config/kibana.yml and verify critical settings such as elasticsearch.hosts: ["http://localhost:9200"] and server.host: 0.0.0.0 to allow external access.
4.3 Log Review
Tail /logs/kibana.log for connection errors or startup failures.
tail -f /logs/kibana.log4.4 UI Checks
Log into Kibana and confirm that the Discover, Dashboards, and Visualizations sections load and function correctly.
5. DSL Query Examples
5.1 Slow Query Log
GET /_search
{
"query": {
"range": {
"@timestamp": {
"gte": "now-1d/d",
"lt": "now/d"
}
}
},
"sort": [{ "took": { "order": "desc" } }],
"size": 10
}5.2 Error Log
GET /filebeat-*/_search
{
"query": {
"match": {
"message": "error"
}
}
}5.3 Node‑Specific Issues
GET /_search
{
"query": {
"term": {
"host.name": {
"value": "node-1"
}
}
}
}6. Enterprise Automation with Python
6.1 Metric Collection Script
The following Python script (requires the requests or official elasticsearch client) gathers cluster health and node statistics, computes CPU usage, load average, memory and heap usage, disk availability, and writes the results as JSON to elasticsearch_metrics.log.
import json
from datetime import datetime
import configparser
import warnings
from elasticsearch import Elasticsearch
warnings.filterwarnings("ignore")
def init_es_client(config_path='./conf/config.ini'):
"""Initialize and return an Elasticsearch client."""
config = configparser.ConfigParser()
config.read(config_path)
es_host = config.get('elasticsearch', 'ES_HOST')
es_user = config.get('elasticsearch', 'ES_USER')
es_password = config.get('elasticsearch', 'ES_PASSWORD')
es = Elasticsearch(
hosts=[es_host],
basic_auth=(es_user, es_password),
verify_certs=False,
ca_certs='conf/http_ca.crt'
)
return es
LOG_FILE = 'elasticsearch_metrics.log'
es = init_es_client()
def get_cluster_health():
return es.cluster.health().body
def get_node_stats():
return es.nodes.stats().body
def get_cluster_metrics():
metrics = {}
metrics['cluster_health'] = get_cluster_health()
node_stats = get_node_stats().get('nodes', {})
metrics['nodes'] = {}
for node_id, info in node_stats.items():
name = info.get('name')
metrics['nodes'][name] = {
'cpu_usage': info['os']['cpu']['percent'],
'load_average': info['os']['cpu'].get('load_average', {}).get('1m'),
'memory_used': info['os']['mem']['used_percent'],
'heap_used': info['jvm']['mem']['heap_used_percent'],
'disk_available': info['fs']['total']['available_in_bytes'] / (1024**3),
'disk_total': info['fs']['total']['total_in_bytes'] / (1024**3),
'disk_usage_percent': 100 - (
info['fs']['total']['available_in_bytes'] * 100 /
info['fs']['total']['total_in_bytes']
)
}
return metrics
def log_metrics():
metrics = get_cluster_metrics()
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
with open(LOG_FILE, 'a') as f:
f.write(f"Timestamp: {timestamp}
")
f.write(json.dumps(metrics, indent=4))
f.write('
')
if __name__ == "__main__":
log_metrics()
print("Elasticsearch cluster metrics logged successfully.")6.2 Cron Scheduling
Schedule the script to run daily at 06:00 using a crontab entry:
0 6 * * * /usr/bin/python3 /home/user/scripts/es_metrics.py >> /home/user/scripts/es_metrics_cron.log 2>&17. Conclusion
Applying the detailed inspection steps and automating metric collection enables early detection of health or performance problems across the EFLK stack, ensuring stable operation. Complementary observability tools such as Prometheus, Zabbix or Grafana can further enhance monitoring coverage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
