Big Data 11 min read

Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection

This tutorial provides a comprehensive, step-by-step procedure for setting up a log‑collection pipeline using Filebeat, Kafka, Zookeeper, Logstash, Elasticsearch, and Kibana across multiple servers, covering hardware preparation, system tuning, software installation, configuration files, and verification commands.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection

Workflow : Filebeat collects log files, forwards them to a Kafka cluster, Logstash consumes the Kafka messages, formats them, and stores them in Elasticsearch; Kibana visualizes the logs.

Hardware Requirements : Four servers are used; each must have JDK installed and environment variables configured.

System Tuning :

sudo vi /etc/profile</code>
<code>export JAVA_HOME=JDK安装路径</code>
<code>export PATH=$JAVA_HOME/bin:$PATH
vim /etc/sysctl.conf</code>
<code>fs.file-max=65536</code>
<code>vm.max_map_count = 262144</code>
<code>vim /etc/security/limits.conf</code>
<code>* soft nofile 65535</code>
<code>* hard nofile 131072</code>
<code>* soft nproc 2048</code>
<code>* hard nproc 4096

Software Versions : (image omitted)

Kafka & Zookeeper Installation

On servers 10.16.10.113, 10.16.10.114, and 10.16.8.187, install Kafka and disable the firewall:

systemctl stop firewalld
systemctl status firewalld

Zookeeper (using Kafka's bundled Zookeeper):

vim config/zookeeper.properties
clientPort=2181</code>
<code>maxClientCnxns=100</code>
<code>tickTime=2000</code>
<code>initLimit=10</code>
<code>syncLimit=5</code>
<code>dataDir=/usr/local/kafka/zookeeper/data</code>
<code>dataLogDir=/usr/local/kafka/zookeeper/log</code>
<code>server.1=10.16.10.113:12888:13888</code>
<code>server.2=10.16.10.114:12888:13888</code>
<code>server.3=10.16.8.187:12888:13888

Create a myid file in each server’s dataDir matching the server number.

Kafka Broker configuration:

vim config/server.properties
broker.id=1</code>
<code>prot = 9092</code>
<code>host.name = 10.16.10.113</code>
<code>num.network.threads=3</code>
<code>num.io.threads=8</code>
<code>socket.send.buffer.bytes=102400</code>
<code>socket.receive.buffer.bytes=102400</code>
<code>socket.request.max.bytes=104857600</code>
<code>log.dirs=/usr/local/kafka-logs</code>
<code>num.partitions=16</code>
<code>num.recovery.threads.per.data.dir=1</code>
<code>offsets.topic.replication.factor=1</code>
<code>transaction.state.log.replication.factor=1</code>
<code>transaction.state.log.min.isr=1</code>
<code>log.retention.hours=168</code>
<code>log.segment.bytes=1073741824</code>
<code>log.retention.check.interval.ms=300000</code>
<code>zookeeper.connect=10.16.10.113:2181,10.16.10.114:2181,10.16.8.187:2181</code>
<code>zookeeper.connection.timeout.ms=6000</code>
<code>group.initial.rebalance.delay.ms=0

Start Zookeeper and Kafka:

nohup sh zookeeper-server-start ../config/zookeeper.properties &
nohup sh kafka-server-start ../config/server.properties &

Create and test a topic:

/usr/local/kafka/bin/kafka-topics.sh --create --zookeeper 10.16.10.113:2181,10.16.10.114:2181,10.16.8.187:2181 --replication-factor 1 --partitions 2 --topic testtopic
/usr/local/kafka/bin/kafka-topics.sh --zookeeper 10.16.10.113:2181,10.16.10.114:2181,10.16.8.187:2181 --list
/usr/local/kafka/bin/kafka-console-producer.sh --broker-list 10.16.10.113:9092 --topic testtopic
/usr/local/kafka/bin/kafka-console-consumer.sh --bootstrap-server 10.16.10.113:9092 --from-beginning --topic testtopic

ELK Stack Installation

Elasticsearch on 10.16.10.113, 10.16.10.114, 10.16.3.165 (master node 10.16.3.165). Edit elasticsearch.yml:

cluster.name: elkmaster</code>
<code>node.name: 10.16.3.165</code>
<code>node.master: true</code>
<code>path.logs: /usr/local/data/log/</code>
<code>network.host: 10.16.3.165</code>
<code>http.port: 9200</code>
<code>discovery.zen.ping.unicast.hosts: ["10.16.10.113","10.16.10.114"]</code>
<code>cluster.initial_master_nodes: ["10.16.3.165"]

Other nodes set node.master: false and adjust cluster.name, node.name, and network.host.

Kibana on 10.16.3.165. Edit kibana.yml:

server.port: 5601</code>
<code>server.host: "10.16.3.165"</code>
<code>elasticsearch.hosts: ["http://10.16.3.165:9200"]</code>
<code>i18n.locale: "zh-CN"

Start services as non‑root:

nohup sh elasticsearch &</code>
<code>/bin/elasticsearch -d</code>
<code>nohup sh kibana &

Verify by accessing http://10.16.3.165:9200 (Elasticsearch) and http://10.16.3.165:5601 (Kibana).

Filebeat Installation

On server 10.16.3.166, edit filebeat.yml to read logs and output to Kafka:

filebeat.inputs:</code>
<code>- type: log</code>
<code>  enabled: true</code>
<code>  paths:</code>
<code>    - /data/home/app/domains/cpay_domain/logs/cpay-tms-gate.log</code>
<code>output.kafka:</code>
<code>  enable: true</code>
<code>  hosts: ["10.16.8.187:9092"]</code>
<code>  topic: es-tmslogs</code>
<code>  compression: gzip</code>
<code>  max_message_bytes: 100000

Start Filebeat:

./filebeat -e -c filebeat.yml

Logstash Installation

On 10.16.3.165, create logstashfortms.conf:

input{</code>
<code>    kafka{</code>
<code>        bootstrap_servers => "10.16.10.113:9092,10.16.10.114:9092,10.16.8.187:9092"</code>
<code>        topics => ["es-tmslogs"]</code>
<code>        codec => json</code>
<code>    }</code>
<code>}</code>
<code>output{</code>
<code>    elasticsearch {</code>
<code>        hosts => ["10.16.3.165:9200"]</code>
<code>        index => "logstash-%{+YYYY.MM.dd}"</code>
<code>    }</code>
<code>}

Start Logstash:

nohup sh logstash -f ../config/logesforcpay.conf &

Kibana Page Operations

After Kibana is running, open http://10.16.3.165:5601, create an index pattern, and explore the visualized logs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataZooKeeperKafkalinuxELKLogstashFilebeat
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.