Backend Development 21 min read

Master Kafka: Core Concepts, Architecture, and Practical Tips

This article explains Kafka's fundamentals, including topics, partitions, brokers, replication, producer‑consumer workflow, consumer groups, offset management, and common exception handling, while providing code examples and diagrams to help developers understand and effectively use this distributed messaging system.

JD Cloud Developers

Oct 25, 2023

Master Kafka: Core Concepts, Architecture, and Practical Tips

What is Kafka

Kafka is a distributed messaging system that provides decoupling, asynchronous communication, flow control, and offers strong ordering guarantees and replay capabilities beyond traditional MQ.

Common Kafka Concepts

Topic and Partition

Topic is a logical grouping of messages; a partition is a physical slice of a topic stored on brokers to improve throughput. Each partition has multiple replicas for fault tolerance.

Broker and Partition

Each Kafka process is a broker; partitions reside on brokers with replicas across different brokers. One replica is elected leader; others are followers. Leader handles reads/writes; followers sync from leader. If leader fails, a follower from ISR becomes new leader.

Producer, Consumer and ZooKeeper

Producers send messages to brokers; consumers read from brokers. ZooKeeper stores cluster metadata, elects the controller, and manages broker registration.

Consumer Groups

Consumer groups allow parallel consumption where each message is processed by only one consumer in the group. Group coordination involves a coordinator, a leader consumer, and follower consumers.

ISR, HW, LEO

ISR (In‑Sync Replicas) are replicas fully caught up with the leader. HW (High Watermark) is the smallest LEO (Log End Offset) among ISR, defining the offset up to which consumers can read. LEO is the next offset to be written in a partition.

Kafka Full Process

Cluster Registration

ZooKeeper must be started first; each broker registers with ZooKeeper and participates in controller election. The elected controller exchanges metadata with ZooKeeper and other brokers to keep the cluster state consistent.

Create Topic

Clients request topic creation; the request is stored in ZooKeeper, propagated to the controller, and brokers create the specified partitions and replicas.

Producer Sends Data

Producer creates a ProducerRecord containing the topic, optional partition, key, value, and timestamp. The key determines the partition for ordering. The record is serialized, metadata is fetched, and the appropriate partition is selected (explicit partition, key hash, or round‑robin). Data is buffered and sent in batches to improve throughput. Kafka uses an enhanced reactor network model with an acceptor thread and multiple processor threads.

private final String topic;</code><code>private final Integer partition;</code><code>private final Headers headers;</code><code>private final K key;</code><code>private final V value;</code><code>private final Long timestamp;

Consumer Consumes Data

Consumers in a group register with the coordinator, which assigns partitions. Consumption proceeds in two phases: registration and data fetching. Consumers poll messages, deserialize them, process the payload, and commit offsets. Offsets can be auto‑committed, manually committed synchronously ( commitSync()), asynchronously ( commitAsync()), or a mix of both. Offsets are stored in the internal __consumer_offsets topic.

try { while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(100)); for (ConsumerRecord<String, String> record : records) { int updateCount = 1; if (map.containsKey(record.value())) { updateCount = (int) map.get(record.value() + 1); } map.put(record.value(), updateCount); } } } finally { consumer.close(); }

while (true) { ConsumerRecords records = consumer.poll(100); for (ConsumerRecord record : records) { log.trace("Kafka消费信息ConsumerRecord={}",record.toString()); } try { consumer.commitAsync(); } catch (CommitFailedException e) { log.error("commitAsync failed", e); } finally { try { consumer.commitSync(); } catch (CommitFailedException e) { log.error("commitSync failed", e); } finally { consumer.close(); } } }

Exception Handling

Uncaught exceptions during processing cause message retries. Kafka first performs local retries; if they fail, the message is placed in a server‑side retry queue. Proper exception handling prevents loss of subsequent messages.

JMQ treats uncaught exceptions as consumption failures, performing two local retries before moving the message to a retry queue that expires after a configurable period.

// Example of handling exceptions during polling
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(60L));
for (ConsumerRecord<String, String> record : records) {
    try {
        // processing logic
    } catch (Exception e) {
        log.error("Bdp监听任务执行失败, taskName:{}", taskName, e);
    }
}

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems java Kafka Replication Message Queue Producer Consumer Offset Management

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.