Big Data 26 min read

Master Kafka: From Basics to Multi‑Broker Cluster Setup

This comprehensive guide introduces Apache Kafka's core concepts—topics, partitions, producers, consumers, and APIs—covers common use cases, walks through downloading, installing, and configuring a single‑node broker, demonstrates multi‑broker clustering, and explains how to use Kafka Connect for data import and export.

Raymond Ops
Raymond Ops
Raymond Ops
Master Kafka: From Basics to Multi‑Broker Cluster Setup

1. Understanding Kafka

1.1 Kafka Overview

Kafka is a distributed streaming platform. Its website is http://kafka.apache.org/.

Key functions: publish/subscribe record streams, fault‑tolerant persistent storage, and processing of records as they occur.

1.2 Topics and Partitions

A topic groups messages; each topic is split into partitions, which are ordered, immutable logs stored on disk. Each record has a key, value, and timestamp.

1.3 Distribution

Partitions are spread across broker servers. One broker acts as the leader for a partition; followers replicate it. Leader failure triggers a follower to become the new leader.

1.4 Producers and Consumers

1.4.1 Producers

Producers publish records to specific topics and can choose the target partition (e.g., round‑robin).

1.4.2 Consumers

Consumers belong to a consumer group; each group receives a copy of the topic data. Within a group, partitions are load‑balanced among instances.

1.5 Use Cases

Messaging – replaces traditional brokers with higher throughput, built‑in partitioning, replication, and fault tolerance.

Website activity tracking – real‑time pipelines for page views, searches, etc.

Metrics aggregation – central feed for operational statistics.

Log aggregation – replaces systems like Scribe or Flume with lower latency and stronger durability.

Stream processing – Kafka Streams API enables stateful transformations and joins.

Event sourcing – durable log of state changes for applications.

Commit log – external durable log for distributed systems.

2. Kafka Installation

Download the desired version from http://kafka.apache.org/downloads.html (e.g., 2.1.0) and extract it.

[root@host]# wget http://mirrors.shu.edu.cn/apache/kafka/2.1.0/kafka_2.11-2.1.0.tgz
[root@host]# tar -C /data/ -xvf kafka_2.11-2.1.0.tgz
[root@host]# cd /data/kafka_2.11-2.1.0/

Configure and start Zookeeper (required for Kafka).

[root@host]# yum -y install java-1.8.0
[root@host]# nohup zookeeper-server-start.sh config/zookeeper.properties &

Configure config/server.properties (broker.id, listeners, log.dirs, zookeeper.connect, etc.) and start the broker.

[root@host]# kafka-server-start.sh -daemon config/server.properties

3. Simple Operations

3.1 Create a Topic

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic along

3.2 Produce Messages

kafka-console-producer.sh --broker-list localhost:9092 --topic along
>First message
>Second message

3.3 Consume Messages

kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic along

4. Multi‑Broker Cluster

Copy server.properties to server-1.properties and server-2.properties, change broker.id, listeners, and log.dirs, then start each broker.

[root@host]# kafka-server-start.sh -daemon config/server-1.properties
[root@host]# kafka-server-start.sh -daemon config/server-2.properties

Verify replication and leader election with kafka-topics.sh --describe.

5. Kafka Connect

Use connect-standalone.sh with a source file connector and a sink file connector to move data between a local file and a Kafka topic.

connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

Check the sink file and the topic with the console consumer.

图片
图片

6. Additional Resources

For more details, refer to the original blog post.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

InstallationProducer ConsumerDistributed Streaming
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.