Comprehensive Guide to Building an ELK Log Management Platform with Kafka and Filebeat
This article provides a detailed tutorial on designing, deploying, and operating an ELK log management platform—including Elasticsearch, Logstash, Kibana, Kafka, and Filebeat—covering architecture options, configuration files, command‑line operations, cluster setup, and best‑practice recommendations for scalable, real‑time log collection and analysis.
ELK Architecture Classification
ELK consists of three components: Elasticsearch (distributed search engine), Logstash (data collection and processing engine), and Kibana (visualization platform). Together they provide a powerful, open‑source log analysis system.
1. Simplest ELK Architecture
Logstash agents on each node collect logs, filter them, and send them to a remote Elasticsearch server for storage. Kibana visualizes the data.
2. ELK Architecture with Kafka
A Kafka message queue buffers logs between Logstash agents and the Elasticsearch cluster, improving reliability and preventing data loss.
3. Filebeat + Kafka + ELK Cluster Architecture
Filebeat replaces Logstash agents for lightweight log shipping, Kafka provides buffering, and both Logstash and Elasticsearch run in cluster mode for high scalability.
Filebeat Service Setup
Filebeat is a Go‑based lightweight agent that consumes fewer resources than Logstash.
1. filebeat.yml Configuration
filebeat.inputs:
- type: log
enabled: true
paths:
- /wls/applogs/rtlog/app.log
fields:
log_topic: appName
multiline:
pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:'
negate: false
match: after
output.kafka:
enabled: true
hosts: ["kafka-1:9092","kafka-2:9092"]
topic: applog
version: "0.10.2.0"
compression: gzip
processors:
- drop_fields:
fields: ["beat", "input", "source", "offset"]
logging.level: error
name: app-server-ip2. Common Operations Commands
Start in foreground: ./filebeat -e -c filebeat.yml Start as daemon: nohup ./filebeat -e -c filebeat.yml & Redirect output to /dev/null:
nohup ./filebeat -e -c filebeat.yml >/dev/null 2>&1 &Stop: ps -ef | grep filebeat then
kill -9 <PID>3. Filebeat Debugging
Validate collection with ./filebeat -e -c filebeat.yml and verify Kafka consumption with
./kafka-console-consumer.sh --zookeeper zk-1:2181,zk-2:2181 --topic app.log.
Kafka Cluster Setup
A typical Kafka cluster includes multiple producers, brokers, consumer groups, and a Zookeeper ensemble for coordination.
1. Kafka Configuration (server.properties)
broker.id=1
port=9092
host.name=192.168.0.1
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=zk-1:2181,zk-2:2181,zk-3:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
delete.topic.enable=trueKey parameters: num.partitions (should exceed consumer count) and delete.topic.enable (enables physical deletion of topics).
2. Kafka Operational Commands
Describe topic:
./kafka-topics.sh --describe --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --topic app.logList topics:
sh kafka-topics.sh --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --listCreate topic:
sh kafka-topics.sh --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --create --topic app.log --partitions 5 --replication-factor 2Delete topic:
./bin/kafka-topics.sh --delete --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --topic app.logProduce messages:
./kafka-console-producer.sh --broker-list kafka-1:9092,kafka-2:9092 --topic app.logConsume messages:
./kafka-console-consumer.sh --zookeeper zk-1:2181,zk-2:2181,zk-3:2181 --topic app.logLogstash
Logstash is a server‑side data processing pipeline that ingests data from multiple sources, transforms it, and forwards it to destinations such as Elasticsearch.
Logstash Configuration Example
input {
kafka {
type => "kafka"
bootstrap_servers => "kafka-1:9092,kafka-2:9092,kafka-3:9092"
topics => "app.log"
consumer_threads => 2
codec => "json"
}
}
filter {
grok {
match => ["message", "%{HTTPDATE:timestamp}", "%{COMBINEDAPACHELOG}"]
}
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
}
}
output {
elasticsearch {
hosts => ["es-1:9300","es-2:9300","es-3:9300"]
index => "applogs-%{+YYYY.MM.dd}"
}
}The pipeline includes a Kafka input, Grok and Date filters, and an Elasticsearch output.
Elasticsearch Cluster Setup
Elasticsearch nodes are categorized as Master, Data, and Client nodes. Master nodes manage metadata, Data nodes store shards, and Client nodes handle query aggregation.
1. Cluster Configuration (elasticsearch.yml)
cluster.name: es
node.name: es-node1
node.master: true
node.data: true
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.2","192.168.0.3"]
discovery.zen.minimum_master_nodes: 22. Service Start/Stop
$ ./bin/elasticsearch -d curl "http://ip:port/_cat/nodes"Kibana
Kibana provides visualization for data stored in Elasticsearch.
1. Kibana Configuration (kibana.yml)
server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"
server.basePath: "/kibana"
server.rewriteBasePath: true2. Kibana Operations
$ nohup ./bin/kibana & ps -ef | grep node
kill -9 <PID>Access Kibana via http://192.168.0.1:5601/ and use the Discover tab to query logs with Lucene syntax (e.g., response:200, message:"Quick brown fox").
Summary
The article walks through the end‑to‑end deployment of an ELK stack with Kafka and Filebeat, covering log collection, filtering, indexing, and visualization. It also discusses scaling strategies, log format standardization, Grok pattern optimization, TraceId tracing, Elasticsearch storage management, and operational best practices for large‑scale log platforms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
