Kafka Architecture, Core Concepts, and Operational Best Practices
This article provides a comprehensive overview of Kafka's architecture, core concepts, high‑throughput design, replication, network model, capacity planning, producer and consumer tuning, custom partitioning, rebalance strategies, broker management, and operational tools for building and maintaining robust distributed messaging systems.
Kafka is a high‑throughput, distributed messaging system that decouples services, enables asynchronous processing, and controls traffic spikes such as flash‑sale events.
Core concepts include producers, consumers, topics, partitions, consumer groups, and the controller node managed via ZooKeeper.
In a Kafka cluster each broker stores partitions as directories on disk; a topic’s log is split into 1 GB segments, and Kafka uses sequential disk writes and zero‑copy (sendfile) to achieve high write and read performance.
Log indexing employs sparse indexes with binary search to locate messages quickly, while replication provides high availability through leader‑follower pairs and ISR lists.
Network design follows a reactor pattern with multiple selectors, threads and queues, and can leverage 10 GbE NICs for extreme concurrency.
Production‑grade deployment requires capacity planning: estimating request volume, storage (e.g., 10 billion daily requests ≈ 276 TB with 2‑replica factor), number of physical servers, SSD vs. HDD choices, memory for OS cache (≈ 60 GB), and CPU cores (≥ 16).
Producer tuning parameters such as buffer.memory, compression.type, batch.size, linger.ms and ACK settings affect throughput and durability.
Consumer offset management moved from ZooKeeper to the internal __consumer_offsets topic, with configurable commit intervals and offset reset policies.
Custom partitioners can be implemented in Java, for example the HotDataPartitioner shown below, and registered via
props.put(\"partitioner.class\", \"com.zhss.HotDataPartitioner\").
public class HotDataPartitioner implements Partitioner { private Random random; @Override public void configure(Map configs) { random = new Random(); } @Override public int partition(String topic, Object keyObj, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) { String key = (String) keyObj; List<PartitionInfo> partitionInfoList = cluster.availablePartitionsForTopic(topic); int partitionCount = partitionInfoList.size(); int hotDataPartition = partitionCount - 1; return !key.contains(\"hot_data\") ? random.nextInt(partitionCount - 1) : hotDataPartition; } }Rebalance strategies (range, round‑robin, sticky) determine how partitions are assigned to consumers; the group coordinator handles join, sync and rebalance cycles.
Broker management includes tracking Log End Offset (LEO) and High Watermark (HW), controller election, delayed operations, and a time‑wheel scheduler for O(1) task insertion.
Operational tools such as Kafka‑Manager and command‑line utilities ( kafka-topics.sh, kafka-reassign-partitions.sh) assist with topic creation, partition reassignment, and load balancing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
