Big Data 29 min read

Kafka Fundamentals: Architecture, Replication, Partitioning, and Performance

This article provides a comprehensive overview of Kafka, covering its role as a message middleware, core concepts, architecture, replication management, partition handling, producer sending modes, partition assignment strategies, load balancing, reliability mechanisms, consumer models, controller election, and factors that affect its high throughput and potential message loss scenarios.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Kafka Fundamentals: Architecture, Replication, Partitioning, and Performance

1. What is a message middleware?

Message middleware is a supporting software system based on queue and message‑passing technology that provides synchronous or asynchronous, reliable message transmission for applications in a network environment.

It utilizes an efficient and reliable messaging mechanism for platform‑independent data exchange and integrates distributed systems by offering message passing and queuing models.

2. What is Kafka? What does it do?

Kafka is a distributed streaming platform known for high throughput, persistence, horizontal scalability, and support for stream processing.

Its main functions are reflected in three points:

Message system : Kafka, like traditional message middleware, provides system decoupling, redundant storage, traffic shaping, buffering, asynchronous communication, scalability, and recoverability, and additionally offers message ordering guarantees and replay consumption.

Storage system : Kafka persists messages to disk, reducing the risk of loss compared with in‑memory systems, and can serve as long‑term storage when retention policies such as "permanent" or log‑compaction are enabled.

Streaming processing platform : Kafka supplies reliable data sources for popular stream‑processing frameworks and provides a complete streaming API (windowing, joins, transformations, aggregations, etc.).

3. What does Kafka’s architecture look like?

A typical Kafka architecture includes several Producers , several Consumers , and a Zookeeper cluster (replaced by KRaft in version 2.8.0).

Producers send messages to Brokers; Brokers store the received messages on disk; Consumers subscribe to Brokers to consume messages.

Key concepts include:

Producer : sends messages to a Broker.

Consumer : receives messages from a Broker.

Consumer Group : a logical subscriber composed of multiple Consumers; each partition is consumed by only one Consumer in the group.

Broker : an individual Kafka service node or instance.

Topic : a logical category that contains many Partitions.

Partition : an ordered queue that enables scalability by spreading a large topic across multiple Brokers.

Replica : identical copies of a partition stored on different Brokers to ensure fault tolerance.

Leader : the primary replica of a partition; Producers and Consumers interact only with the Leader.

Follower : secondary replicas that continuously sync from the Leader; if the Leader fails, a Follower is elected as the new Leader.

4. How are Kafka Replicas managed?

AR (All Replicas): all replicas of a partition.

ISR (In‑Sync Replicas): replicas that are sufficiently synchronized with the Leader, including the Leader itself.

OSR (Out‑of‑Sync Replicas): replicas that have fallen behind the Leader beyond an acceptable threshold.

The Leader tracks the lag of each Follower in the ISR set; when a Follower falls too far behind, it is moved to OSR, and when it catches up, it returns to ISR.

By default, only replicas in ISR are eligible to be promoted to Leader.

5. How to determine the latest readable message?

A partition is a log file. Important offsets are:

Offset range 0‑6 for a seven‑message example.

Offset 0 marks the start of the log.

HW (High Watermark) = 4, meaning offsets 0‑3 are consumable.

LEO (Log End Offset) indicates the next offset to be written.

Each replica in the ISR maintains its own LEO; the smallest LEO among ISR members becomes the partition’s HW.

6. What sending modes do producers have?

There are three modes:

Fire‑and‑forget : the producer sends messages without caring about delivery success; highest throughput but lowest reliability.

Synchronous (sync) : producer.send() returns a Future; calling get() blocks until the broker acknowledges the write, ensuring success before sending the next message.

Asynchronous (async) : the producer provides a callback; the callback is invoked on success or failure, allowing the application to log or retry as needed.

7. What are the partitioning strategies for sending messages?

Round‑robin : messages are distributed evenly across all partitions when the key is null.

Key‑based : a non‑null key is hashed and mapped to a specific partition, guaranteeing ordering for the same key.

Custom partitioner : implement the Partitioner interface to define your own strategy.

Explicit partition : specify the target partition directly.

8. Does Kafka support read‑write separation?

Kafka does **not** support read‑write separation. While read‑write separation can improve load balancing, Kafka’s “master‑write master‑read” architecture already achieves a degree of load distribution without the consistency and latency drawbacks of separate read/write nodes.

9. How does Kafka achieve load balancing?

Kafka balances load primarily through partition distribution. Each broker hosts leaders and followers for multiple partitions, and both producers and consumers interact with every broker, resulting in balanced read and write traffic.

10. What load‑balancing problems can arise?

Uneven broker partition allocation when creating topics.

Producers writing disproportionately to certain leaders.

Consumers pulling heavily from specific leaders.

Leader election or partition reassignment causing uneven leader distribution.

11. How does Kafka guarantee reliability?

Reliability is ensured through several mechanisms:

acks configuration: acks=1: success when the leader writes the message. acks=0: fire‑and‑forget, highest loss risk. acks=-1 or acks=all: success only after all ISR replicas acknowledge, providing the highest durability.

Producer sending modes (sync/async) with retry logic.

Manual offset commits to avoid losing messages when processing fails.

Using the smallest LEO in ISR as the partition’s HW.

12. What consumption models does Kafka support?

Kafka uses a pull model. Two consumption patterns exist:

Point‑to‑point : messages are evenly distributed among consumers in the same consumer group; each message is processed by only one consumer.

Publish‑subscribe : when consumers belong to different groups, each consumer receives every message.

13. What is partition reassignment and why is it needed?

Partition reassignment maintains load balance when brokers fail or new brokers are added. It copies data from old replicas to new ones and then removes the old replicas, optionally throttling the copy process to protect performance.

14. How is a replica leader elected?

When a leader fails, one of its followers becomes the new leader, which may cause temporary load imbalance.

Topic: test Partition:0 Leader:1 Replicas:1,2,0 ISR:1,2,0
Topic: test Partition:1 Leader:2 Replicas:2,0,1 ISR:2,0,1
Topic: test Partition:2 Leader:0 Replicas:0,1,2 ISR:0,1,2

If a broker restarts, the ISR composition may change, leading to different leader assignments.

15. Does increasing the number of partitions always improve throughput?

More partitions can increase throughput up to a point, but excessive partitions raise memory usage, file‑handle consumption, replication traffic, and recovery time, ultimately degrading performance.

16. How to enhance consumer processing capability?

Increase the number of partitions and match the consumer count to the partition count.

Use multithreading and optimize business logic to improve per‑consumer throughput.

17. Consumer‑topic partition assignment strategies

1. RangeAssignor : divides partitions evenly based on consumer count, assigning contiguous ranges.

2. RoundRobinAssignor : sorts all partitions and consumers alphabetically and distributes them in a round‑robin fashion.

3. StickyAssignor : aims for even distribution while preserving previous assignments as much as possible, reducing rebalancing churn.

4. Custom assignor: implement

org.apache.kafka.clients.consumer.internals.PartitionAssignor

to define bespoke logic.

18. What is the Kafka controller and its role?

The controller (a single broker elected among the cluster) manages partition and replica state, handles leader elections, notifies brokers of ISR changes, and performs partition reassignment when topics are expanded.

19. How is the controller elected?

Controller election relies on Zookeeper. Brokers attempt to create the /controller znode; the broker that succeeds becomes the controller. The /controller_epoch persistent node tracks the number of controller changes, and requests with stale epoch values are rejected.

20. Why is Kafka so fast?

Sequential disk I/O.

Leverages the operating system’s page cache.

Zero‑copy data transfer from kernel buffers to sockets.

Partitioned log files with index files for fast look‑ups.

Batch reads and writes.

Batch compression and use of mmap for efficient I/O.

21. Under what circumstances can Kafka lose messages?

Producer acks=0 or acks=1 with leader failure before followers sync.

Broker crash before the OS flushes page‑cache data to disk.

Automatic offset commits before the application finishes processing, causing loss on failure.

Please like, view, and share.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsKafkaReplicationMessage QueuePartitioning
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.