Big Data 11 min read

Why Kafka Handles Billions of Messages: Architecture, Use Cases, and Fast Performance

This article introduces Kafka, LinkedIn’s high‑throughput distributed messaging system, explains its core concepts such as brokers, topics, partitions, offsets, producers, consumers, and consumer groups, outlines common use cases like asynchronous decoupling and data‑stream processing, and details its fast performance mechanisms, fault‑tolerance, installation, and configuration steps.

MaGe Linux Operations

Jun 3, 2021

Why Kafka Handles Billions of Messages: Architecture, Use Cases, and Fast Performance

What Is Kafka?

Kafka is a high‑throughput distributed messaging system launched by LinkedIn, based on a publish‑subscribe model.

Application Scenarios

Asynchronous Decoupling: Suitable for business flows without strong upstream‑downstream dependencies or where immediate processing is not required.

System Buffering: Helps smooth throughput mismatches, especially for slower services.

Peak‑Shaving: Protects backend services from short‑term traffic spikes.

Data Stream Processing: Can be integrated with Spark for real‑time stream analytics.

Kafka Topology (Multi‑Replica Mechanism)

Each partition has multiple replicas; the whole cluster is managed by Zookeeper.

Core Components

Broker : A Kafka server that stores and forwards messages; a broker represents a Kafka node and can host multiple topics.

Topic : Logical category for messages; Kafka classifies messages by topic.

Partition : A topic can have many partitions; messages are stored in partitions, enabling parallel processing and high throughput.

Partitions consist of equal‑size segment files, written sequentially, which contributes to Kafka’s speed.

Offset : The position of a message within a partition’s log, serving as a unique sequence number and the basis for leader‑follower synchronization.

Producer : Client that sends messages to a Kafka broker.

Consumer : Client that reads messages from a broker.

Consumer Group : A set of consumers sharing a group ID; each partition is consumed by only one member of the group.

Zookeeper : Manages the Kafka cluster, storing metadata (brokers, topics, partitions), handling broker failure detection, leader election, and load balancing.

Service Governance

Kafka ensures data reliability through leader‑replica synchronization. Producers write to the leader; the leader replicates data to followers. Acknowledgement (ACK) is sent once the data is in the ISR (in‑sync replica) list. If a follower falls behind, it is removed from ISR.

Fault recovery uses Zookeeper’s Zab algorithm: if the leader fails, a follower is elected as the new leader and producers reconnect.

Why Kafka Is So Fast

Sequential Disk Writes : Kafka writes logs sequentially, avoiding random‑seek overhead.

Page Cache : Kafka relies on the OS page cache instead of Java buffers, reducing I/O latency.

Zero‑Copy : Data is transferred from kernel buffers directly to sockets via system calls like sendfile(), minimizing CPU context switches.

Partition Segmentation : Each partition is split into segments, enabling binary search for offsets and parallel reads.

Data Compression : Supports Gzip and Snappy compression to reduce bandwidth and storage.

Installation Guide

1. Install JDK

yum -y list Java*

yum install java-1.8.0-openjdk-devel.x86_64

java -version

2. Install Zookeeper

tar -zxvf zookeeper-3.4.9.tar.gz

cp zoo_sample.cfg zoo.cfg

vim zoo.cfg   # edit configuration as needed

# Example configuration

# tickTime=2000

# initLimit=10

# syncLimit=5

# dataDir=/tmp/zookeeper

# clientPort=2181

export ZK=/usr/local/src/apache-zookeeper-3.7.0-bin

export PATH=$PATH:$ZK/bin

zkServer.sh start

3. Install Kafka

wget https://www.apache.org/dyn/closer.cgi?path=/kafka/2.8.0/kafka-2.8.0-src.tgz

tar -xzvf kafka_2.12-2.0.0.tgz

export KAFKA=/usr/local/src/kafka

export PATH=$PATH:$KAFKA/bin

nohup kafka-server-start.sh /path/to/your/server.properties &

After these steps, Kafka is up and running.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data kafka Installation Data Streaming

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.