Big Data 24 min read

Master ELK: Build a Scalable Log Management System with Elasticsearch, Logstash, Kibana

This guide introduces the ELK stack (Elasticsearch, Logstash, Kibana, and Filebeat), explains why centralized log management is essential, details the architecture options, and provides step‑by‑step installation and configuration instructions—including a Kafka‑backed pipeline—to help you deploy a production‑grade logging solution.

Open Source Linux

May 16, 2022

Master ELK: Build a Scalable Log Management System with Elasticsearch, Logstash, Kibana

1.1 ELK Overview

ELK is the acronym for the three open‑source frameworks Elasticsearch, Logstash, and Kibana (together often called the Elastic Stack). Filebeat, a lightweight Beats component, can replace Logstash for data collection.

Filebeat forwards and centralizes log data. It monitors specified log files, reads new entries, and ships events to Elasticsearch or Logstash for indexing.

Logstash is a free, open‑source server‑side data‑processing pipeline that can ingest data from multiple sources, transform it, and forward it to your chosen storage.

Elasticsearch is the distributed search and analytics engine at the core of the Elastic Stack. Built on Lucene, it provides near‑real‑time search and analysis for structured, unstructured, numeric, and geospatial data.

Kibana is an open‑source analytics and visualization platform for Elasticsearch. It offers dashboards, charts, and a web UI for exploring and visualizing indexed data.

1.2 Why Use ELK

Logs (system, application, security) give operators insight into server health, configuration errors, and performance. Centralized log management simplifies collection, storage, and analysis across dozens or hundreds of machines, improving troubleshooting efficiency.

1.3 Core Features of a Complete Log System

Collection: gather logs from diverse sources.

Transport: reliably parse, filter, and forward logs to storage.

Storage: persist log data.

Analysis: provide UI‑based analytics.

Alerting: generate error reports and monitoring alerts.

2 ELK Architecture Analysis

2.1 Beats + Elasticsearch + Kibana (Simple)

This basic stack consists of Beats (typically Filebeat) for log shipping, Elasticsearch for storage/search, and Kibana for visualization. Suitable for simple log data and testing; production environments should add Logstash.

2.2 Beats + Logstash + Elasticsearch + Kibana

Adding Logstash brings:

Disk‑based adaptive buffering to absorb bursts.

Ability to ingest from databases, S3, message queues, etc.

Multi‑destination output (e.g., S3, HDFS, files).

Conditional pipeline logic for complex processing.

Filebeat + Logstash advantages include horizontal scalability, high availability, at‑least‑once delivery guarantees, and end‑to‑end encrypted transport (TLS, basic auth, LDAP, etc.).

2.3 Beats + Cache/MQ + Logstash + Elasticsearch + Kibana

ELK architecture with intermediate cache/message queue

Introducing a middleware layer (Redis, Kafka, RabbitMQ) between Beats and Logstash reduces load on log‑generating servers, buffers data to protect Elasticsearch from write spikes, and centralizes formatting and processing.

3 ELK Deployment

3.1 Installing Filebeat

3.1.1 Principle

Filebeat starts one or more inputs that watch specified log locations. For each discovered log, a harvester reads new lines and forwards events to libbeat, which then ships them to the configured output.

3.1.2 Simple Installation

curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.7.0-linux-x86_64.tar.gz
 tar -xzvf filebeat-7.7.0-linux-x86_64.tar.gz

Configuration example: filebeat.reference.yml (contains all non‑deprecated options). Use filebeat.yml for your settings and start with ./filebeat -e.

3.2 Installing Logstash

3.2.1 Basic Principle

Logstash pipelines consist of mandatory inputs, optional filters, and mandatory outputs. Each input runs in its own thread, feeding events into an internal queue; filters process the events; outputs write them to the destination.

3.2.2 Simple Installation

Download: Logstash download page (or the Chinese mirror). Ensure JDK is available (Logstash 7 ships its own JDK). tar -zxvf logstash-7.7.0.tar.gz Test with a HelloWorld pipeline:

./bin/logstash -e 'input { stdin { } } output { stdout {} }'

3.3 Installing Elasticsearch

3.3.1 Overview

Elasticsearch is a distributed, RESTful search and analytics engine built on Lucene. It supports horizontal scaling, full‑text search, near‑real‑time analytics, high availability, dynamic mapping, and a JSON‑over‑HTTP API.

3.3.2 Linux System Settings

ulimit -n 65535               # temporary file descriptor limit
 echo "* soft nofile 65535" >> /etc/security/limits.conf   # permanent
 sysctl -w vm.max_map_count=262144   # required for ES
 sysctl -w vm.swappiness=1          # optional swap tuning

3.3.3 Elasticsearch Installation

groupadd elastic
useradd elk -d /data/hd05/elk -g elastic
 echo "2edseoir@" | passwd elk --stdin
 wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.7.0-linux-x86_64.tar.gz
 tar -zxvf elasticsearch-7.7.0-linux-x86_64.tar.gz
 ln -s elasticsearch-7.7.0 es

Configure elasticsearch.yml (cluster name, node roles, paths, network host, discovery hosts, security settings, etc.). Start with ./bin/elasticsearch -d (daemon) or without -d for foreground.

3.3.4 Setting Up Passwords

./bin/elasticsearch-setup-passwords interactive

Follow the prompts to set passwords for built‑in users (elastic, kibana, logstash_system, etc.).

3.4 Installing Kibana

Download from the Elastic website, extract, and edit kibana.yml:

server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://192.168.110.130:9200", "http://192.168.110.131:9200", "http://192.168.110.132:9200"]
elasticsearch.username: "elastic"
elasticsearch.password: "password"

Start with ./bin/kibana and access http://192.168.110.130:5601/ using the credentials set earlier.

4 Example Pipeline

We build a pipeline: Beats → Kafka (as buffer) → Logstash → Elasticsearch → Kibana.

4.1 Filebeat Configuration (Kafka output)

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /data/elk/logstash-tutorial.log
output.kafka:
  hosts: ["192.168.110.130:9092"]
  topic: 'filebeat_test'
  compression: gzip
  required_acks: 1

Start Filebeat in background:

cd filebeat-7.7.0-linux-x86_64 && nohup ./filebeat -e &

4.2 Logstash Configuration (Apache log parsing)

input {
  kafka {
    bootstrap_servers => "192.168.110.130:9092"
    topics => ["filebeat_test"]
    group_id => "test123"
    auto_offset_reset => "earliest"
  }
}
filter {
  json { source => "message" }
  grok { match => { "message" => "%{COMBINEDAPACHELOG}" } remove_field => "message" }
}
output {
  stdout { codec => rubydebug }
  elasticsearch {
    hosts => ["192.168.110.130:9200", "192.168.110.131:9200", "192.168.110.132:9200"]
    index => "test_kafka"
    user => "elastic"
    password => "${ES_PWD}"
  }
}

Run Logstash:

cd logstash-7.7.0 && nohup ./bin/logstash -f conf.d/apache.conf &

4.3 Verify in Elasticsearch and Kibana

Use the Elasticsearch API (e.g., curl http://192.168.110.130:9200/_cat/indices?v) to confirm the test_kafka index exists, then open Kibana to explore the data via dashboards.

Original source: https://www.cnblogs.com/zsql/p/13164414.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch Logging ELK Logstash Kibana filebeat

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.