Operations 13 min read

Deep Inspection Checklist for Elasticsearch, Filebeat, Logstash, and Kibana

This article presents a comprehensive step‑by‑step guide for deeply inspecting the health, performance, and configuration of Elasticsearch, Filebeat, Logstash, and Kibana, including API calls, key metrics, DSL queries, Python automation scripts, and cron scheduling to ensure stable EFLK clusters.

Mingyi World Elasticsearch

Oct 28, 2024

Deep Inspection Checklist for Elasticsearch, Filebeat, Logstash, and Kibana

Ensuring the stability of an Elasticsearch‑Filebeat‑Logstash‑Kibana (EFLK) stack is critical for both operations and development teams.

Elasticsearch Deep Inspection

1. Cluster health check

Use the GET _cluster/health API to retrieve overall health. Important fields include status (green/yellow/red), number_of_nodes, active_primary_shards, active_shards, and unassigned_shards.

2. Node performance monitoring

Query GET _nodes/stats and focus on indices.docs.count, indices.store.size_in_bytes, jvm.mem.heap_used_percent, os.cpu.percent, and fs.total.available_in_bytes.

3. Shard status monitoring

Run

GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason

and verify that unassigned.reason is empty. If shards are unassigned, re‑allocate with:

POST /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "your-index",
        "shard": 0,
        "node": "node-name",
        "accept_data_loss": true
      }
    }
  ]
}

4. Index status inspection

Execute

GET _cat/indices?v&h=index,health,status,pri,rep,docs.count,store.size

and review fields such as health, status, pri, rep, docs.count, and store.size.

5. Cluster performance analysis with profile queries

Add the profile: true parameter to a search request to obtain execution time per query phase, helping locate bottlenecks.

GET /your-index/_search?pretty
{
  "profile": true,
  "query": {
    "match": {"field": "value"}
  }
}

The response includes timing for each shard‑level operation.

Filebeat Inspection

1. Configuration check

Verify installation with systemctl status filebeat and inspect /etc/filebeat/filebeat.yml for correct input sources and output destinations.

2. Log inspection

Tail the log file ( /var/log/filebeat/filebeat) to ensure no errors such as connection failures or permission issues.

tail -f /var/log/filebeat/filebeat

3. Configuration testing

Run filebeat test config -e to validate the config and output debug logs.

filebeat -e

Logstash Inspection

1. Process check

Confirm Logstash is running via systemctl status logstash.

systemctl status logstash

2. Pipeline configuration check

Review files under /etc/logstash/conf.d/ and ensure correct input, filter, and output sections.

cat /etc/logstash/conf.d/*.conf

3. Log inspection

Tail /var/log/logstash/logstash-plain.log and look for connection failures or Grok parsing errors.

tail -f /var/log/logstash/logstash-plain.log

Kibana Inspection

1. Process status

Check Kibana service with systemctl status kibana.

systemctl status kibana

2. Configuration check

Inspect /config/kibana.yml for correct elasticsearch.hosts and server.host settings.

cat /config/kibana.yml

3. Log inspection

Tail /logs/kibana.log and verify connectivity to Elasticsearch and successful startup.

tail -f logs/kibana.log

4. UI verification

Discover : search and view log data.

Dashboards : ensure proper rendering.

Visualizations : confirm charts load correctly.

DSL Query Examples

1. Slow query logs

GET /_search
{
  "query": {"range": {"@timestamp": {"gte": "now-1d/d", "lt": "now/d"}}},
  "sort": [{"took": {"order": "desc"}}],
  "size": 10
}

2. Error logs

GET /filebeat-*/_search
{
  "query": {"match": {"message": "error"}}
}

3. Node‑specific performance logs

GET /_search
{
  "query": {"term": {"host.name": {"value": "node-1"}}}
}

Enterprise‑Level Automation: Monitoring Elasticsearch with Python

1. Metric collection script

The script uses the requests library and the official elasticsearch Python client to fetch cluster health and node statistics, then logs them as JSON.

import json
from datetime import datetime
import configparser
import warnings
from elasticsearch import Elasticsearch

warnings.filterwarnings("ignore")

def init_es_client(config_path='./conf/config.ini'):
    """Initialize and return an Elasticsearch client"""
    config = configparser.ConfigParser()
    config.read(config_path)
    es_host = config.get('elasticsearch', 'ES_HOST')
    es_user = config.get('elasticsearch', 'ES_USER')
    es_password = config.get('elasticsearch', 'ES_PASSWORD')
    es = Elasticsearch(
        hosts=[es_host],
        basic_auth=(es_user, es_password),
        verify_certs=False,
        ca_certs='conf/http_ca.crt'
    )
    return es

LOG_FILE = 'elasticsearch_metrics.log'
es = init_es_client()

def get_cluster_health():
    return es.cluster.health().body

def get_node_stats():
    return es.nodes.stats().body

def get_cluster_metrics():
    metrics = {}
    cluster_health = get_cluster_health()
    metrics['cluster_health'] = cluster_health
    node_stats = get_node_stats()
    nodes = node_stats.get('nodes', {})
    metrics['nodes'] = {}
    for node_id, node_info in nodes.items():
        node_name = node_info.get('name')
        metrics['nodes'][node_name] = {
            'cpu_usage': node_info['os']['cpu']['percent'],
            'load_average': node_info['os']['cpu'].get('load_average', {}).get('1m'),
            'memory_used': node_info['os']['mem']['used_percent'],
            'heap_used': node_info['jvm']['mem']['heap_used_percent'],
            'disk_available': node_info['fs']['total']['available_in_bytes'] / (1024 ** 3),
            'disk_total': node_info['fs']['total']['total_in_bytes'] / (1024 ** 3),
            'disk_usage_percent': 100 - (node_info['fs']['total']['available_in_bytes'] * 100 / node_info['fs']['total']['total_in_bytes'])
        }
    return metrics

def log_metrics():
    metrics = get_cluster_metrics()
    timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    with open(LOG_FILE, 'a') as f:
        f.write(f"Timestamp: {timestamp}
")
        f.write(json.dumps(metrics, indent=4))
        f.write('

')

if __name__ == "__main__":
    log_metrics()
    print("Elasticsearch cluster metrics logged successfully.")

2. Scheduling with cron

Place the script at /home/user/scripts/es_metrics.py and add the following line to crontab -e to run daily at 06:00:

0 6 * * * /usr/bin/python3 /home/user/scripts/es_metrics.py >> /home/user/scripts/es_metrics_cron.log 2>&1

Conclusion

The presented deep‑inspection checklist enables comprehensive monitoring of each EFLK component—Elasticsearch health, node performance, shard allocation, index status; Filebeat, Logstash, and Kibana service health, configuration, and logs. Regular manual checks combined with automated Python‑driven metrics collection and cron scheduling help detect issues early and keep the stack running smoothly.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Python Elasticsearch metrics cron Logstash Kibana Filebeat

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.