Operations 14 min read

How to Build a Fault‑Tolerant Kafka‑ZooKeeper Cluster on Docker

This step‑by‑step guide shows how to deploy a three‑node Kafka and ZooKeeper cluster in Docker, covering JDK installation, ZooKeeper configuration, server ID setup, Kafka download, configuration tweaks, topic management, and verification of leader‑follower relationships.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Build a Fault‑Tolerant Kafka‑ZooKeeper Cluster on Docker

Cluster Environment

Three Docker containers are used as Kafka nodes (kafka_node1, kafka_node2, kafka_node3) with the following IPs and software versions:

IP Address   Hostname      Kafka Version               ZooKeeper Version          JDK Version
172.17.0.2   kafka_node1   kafka_2.12-2.2.1.tgz       zookeeper-3.4.14.tar.gz    jdk-8u161-linux-x64.tar.gz
172.17.0.3   kafka_node2   kafka_2.12-2.2.1.tgz       zookeeper-3.4.14.tar.gz    jdk-8u161-linux-x64.tar.gz
172.17.0.4   kafka_node3   kafka_2.12-2.2.1.tgz       zookeeper-3.4.14.tar.gz    jdk-8u161-linux-x64.tar.gz

Note: In production, store data on reliable or redundant storage devices.

Deploy JDK

tar xf jdk-8u161-linux-x64.tar.gz -C /usr/local
cat <<EOF >> /etc/profile
#################JAVA#################
export JAVA_HOME=/usr/local/jdk1.8.0_161
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
EOF
source /etc/profile
java -version

Deploy ZooKeeper

Download and extract ZooKeeper:

wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
tar xf zookeeper-3.4.14.tar.gz -C /usr/local

Copy the sample configuration and edit zoo.cfg:

cp -rf /usr/local/zookeeper-3.4.14/conf/zoo_sample.cfg /usr/local/zookeeper-3.4.14/conf/zoo.cfg
cat <<EOF > /usr/local/zookeeper-3.4.14/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zk_data
dataLogDir=/usr/local/zookeeper-3.4.14/logs
clientPort=2181
maxClientCnxns=60
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=172.17.0.2:2888:3888
server.2=172.17.0.3:2888:3888
server.3=172.17.0.4:2888:3888
EOF

Create data and log directories and the myid file for each node:

mkdir -p /data/zk_data
mkdir -p /usr/local/zookeeper-3.4.14/logs
# On each node
echo "1" > /data/zk_data/myid   # on kafka_node1
echo "2" > /data/zk_data/myid   # on kafka_node2
echo "3" > /data/zk_data/myid   # on kafka_node3

Start the ZooKeeper ensemble and verify the leader/follower roles with lsof -i:2888.

Deploy Kafka

Download and extract Kafka:

wget https://mirror.bit.edu.cn/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz
tar xf kafka_2.12-2.2.1.tgz -C /usr/local

Backup the original server properties and edit them:

cp -rf /usr/local/kafka_2.12-2.2.1/config/server.properties /usr/local/kafka_2.12-2.2.1/config/server.properties.default
cat <<EOF > /usr/local/kafka_2.12-2.2.1/config/server.properties
broker.id=1
listeners=PLAINTEXT://172.17.0.2:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/data/kafka-logs/
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=72
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=172.17.0.2:2181,172.17.0.3:2181,172.17.0.4:2181
delete.topic.enable=true
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=3000
EOF

Start each Kafka broker with the daemon flag and verify the processes with jps and netstat -anplt | egrep "(2181|9092)".

Manage Kafka Topics

Create a replicated topic:

/usr/local/kafka_2.12-2.2.1/bin/kafka-topics.sh --create \
  --bootstrap-server 172.17.0.2:9092,172.17.0.3:9092,172.17.0.4:9092 \
  --replication-factor 3 --partitions 3 --topic kafka_data

List topics on each broker to confirm replication:

/usr/local/kafka_2.12-2.2.1/bin/kafka-topics.sh --list --bootstrap-server 172.17.0.2:9092

Produce messages:

/usr/local/kafka_2.12-2.2.1/bin/kafka-console-producer.sh \
  --broker-list 172.17.0.2:9092 --topic kafka_data
> HelloKafka_data
> I'm the 172.17.0.2 Kafka create
> test

Consume from another node to verify replication:

/usr/local/kafka_2.12-2.2.1/bin/kafka-console-consumer.sh \
  --bootstrap-server 172.17.0.4:9092 --topic kafka_data --from-beginning
I’m the 172.17.0.2 Kafka create
test
HelloKafka_data

Describe the topic to see partition leaders and ISR:

/usr/local/kafka_2.12-2.2.1/bin/kafka-topics.sh --describe \
  --bootstrap-server 172.17.0.2:9092 --topic kafka_data

Delete the topic (deletion propagates to all nodes):

/usr/local/kafka_2.12-2.2.1/bin/kafka-topics.sh --delete \
  --bootstrap-server 172.17.0.2:9092 --topic kafka_data

After deletion, the topic no longer appears in the list on any broker.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OperationsLinux
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.