Operations 22 min read

Comprehensive Guide to Building an ELK Log Management Platform with Kafka and Filebeat

This article provides a detailed tutorial on designing, deploying, and operating an ELK log management platform—including Elasticsearch, Logstash, Kibana, Kafka, and Filebeat—covering architecture options, configuration files, command‑line operations, cluster setup, and best‑practice recommendations for scalable, real‑time log collection and analysis.

Big Data Technology & Architecture

Jul 28, 2019

Comprehensive Guide to Building an ELK Log Management Platform with Kafka and Filebeat

ELK Architecture Classification

ELK consists of three components: Elasticsearch (distributed search engine), Logstash (data collection and processing engine), and Kibana (visualization platform). Together they provide a powerful, open‑source log analysis system.

1. Simplest ELK Architecture

Logstash agents on each node collect logs, filter them, and send them to a remote Elasticsearch server for storage. Kibana visualizes the data.

2. ELK Architecture with Kafka

A Kafka message queue buffers logs between Logstash agents and the Elasticsearch cluster, improving reliability and preventing data loss.

3. Filebeat + Kafka + ELK Cluster Architecture

Filebeat replaces Logstash agents for lightweight log shipping, Kafka provides buffering, and both Logstash and Elasticsearch run in cluster mode for high scalability.

Filebeat Service Setup

Filebeat is a Go‑based lightweight agent that consumes fewer resources than Logstash.

1. filebeat.yml Configuration

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /wls/applogs/rtlog/app.log
  fields:
    log_topic: appName
  multiline:
    pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:'
    negate: false
    match: after
output.kafka:
  enabled: true
  hosts: ["kafka-1:9092","kafka-2:9092"]
  topic: applog
  version: "0.10.2.0"
  compression: gzip
processors:
- drop_fields:
    fields: ["beat", "input", "source", "offset"]
logging.level: error
name: app-server-ip

2. Common Operations Commands

Start in foreground: ./filebeat -e -c filebeat.yml Start as daemon: nohup ./filebeat -e -c filebeat.yml & Redirect output to /dev/null:

nohup ./filebeat -e -c filebeat.yml >/dev/null 2>&1 &

Stop: ps -ef | grep filebeat then

kill -9 <PID>

3. Filebeat Debugging

Validate collection with ./filebeat -e -c filebeat.yml and verify Kafka consumption with

./kafka-console-consumer.sh --zookeeper zk-1:2181,zk-2:2181 --topic app.log

Kafka Cluster Setup

A typical Kafka cluster includes multiple producers, brokers, consumer groups, and a Zookeeper ensemble for coordination.

1. Kafka Configuration (server.properties)

broker.id=1
port=9092
host.name=192.168.0.1
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=zk-1:2181,zk-2:2181,zk-3:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
delete.topic.enable=true

Key parameters: num.partitions (should exceed consumer count) and delete.topic.enable (enables physical deletion of topics).

2. Kafka Operational Commands

Describe topic:

./kafka-topics.sh --describe --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --topic app.log

List topics:

sh kafka-topics.sh --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --list

Create topic:

sh kafka-topics.sh --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --create --topic app.log --partitions 5 --replication-factor 2

Delete topic:

./bin/kafka-topics.sh --delete --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --topic app.log

Produce messages:

./kafka-console-producer.sh --broker-list kafka-1:9092,kafka-2:9092 --topic app.log

Consume messages:

./kafka-console-consumer.sh --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --topic app.log

Logstash

Logstash is a server‑side data processing pipeline that ingests data from multiple sources, transforms it, and forwards it to destinations such as Elasticsearch.

Logstash Configuration Example

input {
  kafka {
    type => "kafka"
    bootstrap_servers => "kafka-1:9092,kafka-2:9092,kafka-3:9092"
    topics => "app.log"
    consumer_threads => 2
    codec => "json"
  }
}
filter {
  grok {
    match => ["message", "%{HTTPDATE:timestamp}", "%{COMBINEDAPACHELOG}"]
  }
  date {
    match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
  }
}
output {
  elasticsearch {
    hosts => ["es-1:9300","es-2:9300","es-3:9300"]
    index => "applogs-%{+YYYY.MM.dd}"
  }
}

The pipeline includes a Kafka input, Grok and Date filters, and an Elasticsearch output.

Elasticsearch Cluster Setup

Elasticsearch nodes are categorized as Master, Data, and Client nodes. Master nodes manage metadata, Data nodes store shards, and Client nodes handle query aggregation.

1. Cluster Configuration (elasticsearch.yml)

cluster.name: es
node.name: es-node1
node.master: true
node.data: true
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.2","192.168.0.3"]
discovery.zen.minimum_master_nodes: 2

2. Service Start/Stop

$ ./bin/elasticsearch -d

curl "http://ip:port/_cat/nodes"

Kibana

Kibana provides visualization for data stored in Elasticsearch.

1. Kibana Configuration (kibana.yml)

server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"
server.basePath: "/kibana"
server.rewriteBasePath: true

2. Kibana Operations

$ nohup ./bin/kibana &

ps -ef | grep node
kill -9 <PID>

Access Kibana via http://192.168.0.1:5601/ and use the Discover tab to query logs with Lucene syntax (e.g., response:200, message:"Quick brown fox").

Summary

The article walks through the end‑to‑end deployment of an ELK stack with Kafka and Filebeat, covering log collection, filtering, indexing, and visualization. It also discusses scaling strategies, log format standardization, Grok pattern optimization, TraceId tracing, Elasticsearch storage management, and operational best practices for large‑scale log platforms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch kafka ELK log management Logstash Kibana filebeat

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.