Backend Development 33 min read

Comprehensive Guide to Kafka Architecture, Core Concepts, Deployment, and Operations

This article provides an in‑depth overview of Kafka, covering why messaging systems are needed, core concepts, cluster architecture, performance optimizations such as sequential disk writes and zero‑copy, resource planning, deployment steps, configuration details, operational tools, and advanced topics like custom partitioners and time‑wheel scheduling.

Top Architect

Mar 12, 2022

Comprehensive Guide to Kafka Architecture, Core Concepts, Deployment, and Operations

01、Why Use a Messaging System

Messaging decouples components, enables asynchronous processing, and helps control traffic spikes, illustrated with an e‑commerce flash‑sale workflow.

02、Kafka Core Concepts

Explains producers, consumers, topics, partitions, and how Kafka stores massive data across multiple brokers.

03、Kafka Cluster Architecture

Describes brokers, topics, partitions, consumer groups, controllers, and Zookeeper coordination.

04、Sequential Disk Writes for High Write Performance

Kafka writes data sequentially to OS cache and then to disk, achieving write speeds comparable to memory.

05、Zero‑Copy Mechanism for High Read Performance

Outlines the consumer read path using OS cache and Linux sendfile to avoid data copying, with illustrative images.

06、Log Segmentation

Each partition stores data in .log files, typically 1 GB each, distributed across multiple servers.

07、Binary Search for Data Location

Kafka uses sparse indexes and binary search to locate messages efficiently.

08、High‑Concurrency Network Design (NIO Overview)

Discusses Reactor network patterns and Kafka’s network architecture that support high concurrency.

09、Redundant Replicas for High Availability

Explains leader‑follower replication, ISR lists, and the need for multiple replicas.

10、Architecture Summary

Kafka achieves high concurrency, availability, and performance through replication, network design, sequential writes, and zero‑copy.

11、Production Environment Setup

Provides a step‑by‑step guide to building a Kafka cluster for a large‑scale e‑commerce scenario.

12、Scenario Analysis

电商平台，需要每天10亿请求都要发送到Kafka集群上面。
10亿请求 → 24 GB/天，峰值 QPS≈55 k。

13‑18、Resource Evaluation

Assesses physical machines, disk selection (mechanical HDD sufficient for sequential writes), memory sizing (≈64 GB), CPU cores (≥16, preferably 32), and network bandwidth (10 GbE recommended).

19‑22、Cluster Planning and Zookeeper

Details host layout, Zookeeper ensemble, and controller responsibilities.

23‑25、Kafka Operations

Introduces KafkaManager, common commands for topic creation, partition scaling, and replica reassignment.

26‑31、Producer and Consumer Configuration

Covers producer settings (buffer.memory, compression.type, batch.size, linger.ms) and consumer error handling (LeaderNotAvailableException, retries, network exceptions).

32、ACK Parameter Details

Explains acks=0/1/-1 and the role of min.insync.replicas for data durability.

33、Custom Partitioner Example

public class HotDataPartitioner implements Partitioner {
    private Random random;
    @Override
    public void configure(Map<String, ?> configs) { random = new Random(); }
    @Override
    public int partition(String topic, Object keyObj, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        String key = (String)keyObj;
        List<PartitionInfo> partitionInfoList = cluster.availablePartitionsForTopic(topic);
        int partitionCount = partitionInfoList.size();
        int hotDataPartition = partitionCount - 1;
        return !key.contains("hot_data") ? random.nextInt(partitionCount - 1) : hotDataPartition;
    }
}

34‑42、Comprehensive Case Studies

Shows an e‑commerce “star” reward system where orders are produced to Kafka and a membership service consumes them, discussing key‑based ordering, offset management, consumer groups, and rebalance strategies (range, round‑robin, sticky).

43‑45、Group Coordinator and Rebalance Strategies

Explains how a coordinator is selected, offset commit flow, and the three rebalance algorithms.

46‑48、LEO and HW Concepts

Defines Log End Offset (LEO) and High Watermark (HW) and their impact on message visibility.

49、Controller Management

Describes controller election via Zookeeper and its responsibilities.

50‑51、Delayed Tasks and Time‑Wheel Mechanism

Details Kafka’s internal delayed operations (e.g., acks timeout, follower fetch) and the O(1) time‑wheel scheduler used for them.

Additional Promotional Content

Contains calls to action for joining groups, scanning QR codes, and external links, which are not part of the technical guide.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend distributed systems deployment messaging

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.