Big Data 32 min read

Master Kafka: From Core Concepts to Advanced Operations and Performance Tuning

This comprehensive guide explains Kafka’s origins, core architecture, data structures, write and read workflows, operational commands for topic and consumer‑group management, and practical performance‑tuning tips such as disk layout, JVM settings, flush policies, and log retention, providing a complete reference for engineers working with distributed streaming platforms.

Ops Development Stories

Aug 19, 2022

Master Kafka: From Core Concepts to Advanced Operations and Performance Tuning

1 Kafka Overview

Kafka was originally developed by LinkedIn in Scala as a multi‑partition, multi‑replica distributed messaging system coordinated by ZooKeeper, now an Apache project. It functions as a high‑throughput, durable, horizontally scalable stream processing platform written in Scala and Java.

It enables producers to publish messages to a specific topic, while consumers subscribe to that topic and receive messages with low latency and high throughput. A topic’s partitions can be consumed by only one consumer in the same consumer group at a time, ensuring independent progress.

Compared with traditional MQs, Kafka is lightweight, non‑intrusive, and requires minimal dependencies.

Kafka is widely integrated with systems like Cloudera, Storm, Spark, and Flink, serving three roles: messaging system, storage system, and stream processing platform.

2 What Problems Kafka Solves

It addresses asynchronous processing, service decoupling, and traffic control, similar to traditional message queues.

3 Technical Features

High throughput and low latency (hundreds of thousands of messages per second, millisecond latency).

Scalable clusters with hot‑add capability.

Persistence and reliability via disk storage and replication.

Fault tolerance allowing node failures (n‑1 failures tolerated for n replicas).

Supports thousands of concurrent clients.

Queue mode: consumers share a queue, enabling parallel consumption of partitions.

Publish‑subscribe mode: messages are broadcast to all subscribing consumers.

4 Kafka Working Principle

4.1 Architecture Diagram

Producer

Creates messages and sends them to Kafka.

Consumer

Connects to Kafka to receive messages and process business logic.

Consumer Group (CG)

A set of consumers where each partition is consumed by only one consumer in the group; groups operate independently.

Broker

A Kafka service node; one or more brokers form a cluster.

Controller

A broker elected as the controller manages partition and replica states across the cluster.

4.2 Write Process

Connect to ZooKeeper to obtain partition and leader information for the target topic.

Send the message to the appropriate broker.

Specify topic, value, optional partition, and key.

Serialize the message.

If a partition is specified, the partitioner is bypassed; otherwise it selects a partition based on the key.

Messages are batched locally; the client sends batches to the broker.

The broker acknowledges with topic, partition, and offset information.

4.3 Read Process

Connect to ZooKeeper to get partition and leader info.

Connect to the leader broker.

Consumer requests the desired topic, partition, and offset.

The leader locates the segment (index and log files) based on the offset.

Using the index, it reads the appropriate log segment and returns the data.

4.4 Core Concepts

Topic

A logical queue where producers write and consumers read.

Partition

Each topic is split into ordered partitions; the number of partitions determines parallelism.

Replica

Each partition has multiple replicas for fault tolerance; one leader handles reads/writes, followers replicate the data.

5 Kafka Data Structure Explanation

5.1 Registration Data in ZooKeeper

Kafka stores metadata in ZooKeeper and uses watches for changes (e.g., consumer failures).

Cluster registration : a persistent node /cluster/id holds the cluster ID.

Broker node registration : each broker creates a temporary znode with its host, port, and configuration.

Topic registration : a temporary znode records topic partitions and ISR information.

Controller election node : temporary node /controller holds the current controller broker ID.

ISR change notification : node /isr_change_notification triggers leader re‑election on failures.

5.2 Topic Data Structure

Each partition has log files ( .log) and index files ( .index, .timeindex). Log files consist of batches; each batch stores metadata such as base offset, position, size, CRC, compression type, timestamps, producer ID, and message count. {"version":"1","id":"0"} Example of dumping log segments:

[root@hybrid03 event_topic-5]# kafka-run-class.sh kafka.tools.DumpLogSegments --files 00000000000000000100.log
Dumping 00000000000000000100.log
Starting offset: 100
baseOffset: 100 lastOffset: 100 count: 1 ...

Index files map offsets to physical positions; they are sparse and configurable via index.interval.bytes.

offset: 114 position: 4126 "第114条消息，要从文件的第4126号物理位置开始读"

6 Kafka Operations

6.1 Topic Management Commands

Use kafka-topics.sh to create, delete, expand partitions, describe, and list topics.

# Create topic
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 3 --topic test
# Expand partitions
kafka-topics.sh --alter --zookeeper localhost:2181 --topic test --partitions 4
# Delete topic
kafka-topics.sh --delete --zookeeper localhost:2181 --topic test
# Describe topic
kafka-topics.sh --describe --zookeeper localhost:2181 --topic event_topic

6.2 Rebalancing After Adding/Removing Nodes

Use kafka-reassign-partitions.sh to generate a reassignment plan and execute it.

# Create move-json-file.json with topics to move
{
  "topics":[{"topic":"event_topic"},{"topic":"profile_topic"},{"topic":"item_topic"}],
  "version":1
}
# Generate plan
kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file mv.json --broker-list "1001,1002" --generate
# Execute plan
kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file reassignment-json-file.json --execute

6.3 Consumer Group Commands

View consumption status, delete groups, and reset offsets.

# Describe group
kafka-consumer-groups.sh --bootstrap-server 127.0.0.1:9092 --describe --group test-group
# Delete group
kafka-consumer-groups.sh --bootstrap-server 127.0.0.1:9092 --delete --group test-group
# Reset offsets (earliest, latest, specific offset, etc.)
kafka-consumer-groups.sh --bootstrap-server kafka-host:port --group test-group --reset-offsets --all-topics --to-earliest --execute

6.4 Set Topic Retention Time

# Set retention to 1 hour (3600000 ms)
kafka-configs.sh --zookeeper 127.0.0.1:2181 --alter --entity-name topic-devops-elk-log-hechuan-huanbao --entity-type topics --add-config retention.ms=3600000
# Describe config
kafka-configs.sh --zookeeper 127.0.0.1:2181 --describe --entity-name topic-devops-elk-log-hechuan-huanbao --entity-type topics

6.5 Useful Scripts

Produce and consume messages via console scripts.

# Produce
bin/kafka-console-producer.sh --broker-list kafka-host:port --topic test-topic
# Consume from beginning
bin/kafka-console-consumer.sh --bootstrap-server kafka-host:port --topic test-topic --group test-group --from-beginning

Performance testing scripts for producers and consumers are also provided.

7 Common Performance Tuning

7.1 Disk Directory Optimization

Distribute partitions across multiple disks to avoid I/O contention. Example configuration:

log.dirs=/data/seqdata00/kafka/data,/data/seqdata01/kafka/data,/data/seqdata02/kafka/data

7.2 JVM Parameter Configuration

Prefer G1 GC over CMS; minimum Java version JDK 1.7u51. G1 advantages include better throughput‑latency balance, region‑based memory management, and configurable pause targets.

Typical JVM options are shown in the accompanying screenshot.

7.3 Log Flush Strategy

Configure flushing either by message count ( log.flush.interval.messages=100000) or time interval ( log.flush.interval.ms=1000).

7.4 Log Retention Time

Default retention is 7 days; can be adjusted with log.retention.hours=168 (7 days).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kafka Performance Tuning Distributed Messaging Data Streaming

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.