Comprehensive Introduction to Apache Kafka: Concepts, Architecture, Installation, and Usage
This article provides a comprehensive guide to Apache Kafka, covering its core concepts, architecture, key APIs, topics and partitions, deployment steps, multi‑broker clustering, fault tolerance, and data integration using Kafka Connect, with detailed command‑line examples.
Kafka Overview
Kafka is a distributed streaming platform that provides publish/subscribe messaging, fault‑tolerant storage, and real‑time stream processing capabilities.
Core Concepts
Publish and Subscribe – messages are written to and read from topics.
Durable Storage – records are persisted in an append‑only log.
Stream Processing – consumers can process records as they arrive.
Key Terminology
Topic : a logical category of records, possibly spanning multiple partitions.
Partition : an ordered, immutable sequence of records stored as a log file.
Broker : a server that hosts partitions and serves client requests.
Leader / Follower : each partition has one leader handling reads/writes; followers replicate the leader.
Core APIs
Producer API : publish records to topics.
Consumer API : subscribe to topics and read records.
Streams API : build stream processing applications.
Connector API : integrate Kafka with external systems (e.g., databases).
Installation
Download the desired version from kafka.apache.org and extract it.
[root@along ~]# wget http://mirrors.shu.edu.cn/apache/kafka/2.1.0/kafka_2.11-2.1.0.tgz
[root@along ~]# tar -C /data/ -xvf kafka_2.11-2.1.0.tgz
[root@along ~]# cd /data/kafka_2.11-2.1.0/Zookeeper Configuration
Kafka requires Zookeeper for cluster coordination. [root@along ~]# yum -y install java-1.8.0 Modify config/zookeeper.properties as needed (e.g., dataDir, clientPort).
Kafka Broker Configuration
Edit config/server.properties to set broker ID, listeners, log directories, and Zookeeper connection.
broker.id=0
listeners=PLAINTEXT://localhost:9092
log.dirs=/tmp/kafka-logs
zookeeper.connect=localhost:2181Starting Services
Start Zookeeper and then Kafka:
# nohup zookeeper-server-start.sh config/zookeeper.properties &
# service kafka startBasic Operations
Create a topic:
# kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic alongSend messages with the console producer:
# kafka-console-producer.sh --broker-list localhost:9092 --topic along
> This is a messageConsume messages with the console consumer:
# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic along --from-beginningMulti‑Broker Cluster
Copy server.properties to create server-1.properties and server-2.properties, change broker.id, listeners, and log.dirs, then start each broker.
# nohup kafka-server-start.sh config/server-1.properties &
# nohup kafka-server-start.sh config/server-2.properties &Verify the cluster with kafka-topics.sh --describe and test fault tolerance by killing a broker; the leader will automatically move to another replica.
Kafka Connect
Kafka Connect enables importing/exporting data without custom code. Run in standalone mode with configuration files:
# connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.propertiesExample: source connector reads lines from test.txt into topic connect-test; sink connector writes the topic back to test.sink.txt.
# echo -e "foo
bar" > test.txt
# cat test.sink.txt
foo
barConsume the topic directly to see the JSON payloads.
# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning
{"schema":{"type":"string","optional":false},"payload":"foo"}
{"schema":{"type":"string","optional":false},"payload":"bar"}The article also contains promotional text unrelated to technical content, which has been omitted from the summary.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
