Master Kafka: Core Concepts, Architecture, and Practical Tips
This article explains Kafka's fundamentals, including topics, partitions, brokers, replication, producer‑consumer workflow, consumer groups, offset management, and common exception handling, while providing code examples and diagrams to help developers understand and effectively use this distributed messaging system.
What is Kafka
Kafka is a distributed messaging system that provides decoupling, asynchronous communication, flow control, and offers strong ordering guarantees and replay capabilities beyond traditional MQ.
Common Kafka Concepts
Topic and Partition
Topic is a logical grouping of messages; a partition is a physical slice of a topic stored on brokers to improve throughput. Each partition has multiple replicas for fault tolerance.
Broker and Partition
Each Kafka process is a broker; partitions reside on brokers with replicas across different brokers. One replica is elected leader; others are followers. Leader handles reads/writes; followers sync from leader. If leader fails, a follower from ISR becomes new leader.
Producer, Consumer and ZooKeeper
Producers send messages to brokers; consumers read from brokers. ZooKeeper stores cluster metadata, elects the controller, and manages broker registration.
Consumer Groups
Consumer groups allow parallel consumption where each message is processed by only one consumer in the group. Group coordination involves a coordinator, a leader consumer, and follower consumers.
ISR, HW, LEO
ISR (In‑Sync Replicas) are replicas fully caught up with the leader. HW (High Watermark) is the smallest LEO (Log End Offset) among ISR, defining the offset up to which consumers can read. LEO is the next offset to be written in a partition.
Kafka Full Process
Cluster Registration
ZooKeeper must be started first; each broker registers with ZooKeeper and participates in controller election. The elected controller exchanges metadata with ZooKeeper and other brokers to keep the cluster state consistent.
Create Topic
Clients request topic creation; the request is stored in ZooKeeper, propagated to the controller, and brokers create the specified partitions and replicas.
Producer Sends Data
Producer creates a ProducerRecord containing the topic, optional partition, key, value, and timestamp. The key determines the partition for ordering. The record is serialized, metadata is fetched, and the appropriate partition is selected (explicit partition, key hash, or round‑robin). Data is buffered and sent in batches to improve throughput. Kafka uses an enhanced reactor network model with an acceptor thread and multiple processor threads.
private final String topic;</code><code>private final Integer partition;</code><code>private final Headers headers;</code><code>private final K key;</code><code>private final V value;</code><code>private final Long timestamp;Consumer Consumes Data
Consumers in a group register with the coordinator, which assigns partitions. Consumption proceeds in two phases: registration and data fetching. Consumers poll messages, deserialize them, process the payload, and commit offsets. Offsets can be auto‑committed, manually committed synchronously ( commitSync()), asynchronously ( commitAsync()), or a mix of both. Offsets are stored in the internal __consumer_offsets topic.
try { while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(100)); for (ConsumerRecord<String, String> record : records) { int updateCount = 1; if (map.containsKey(record.value())) { updateCount = (int) map.get(record.value() + 1); } map.put(record.value(), updateCount); } } } finally { consumer.close(); } while (true) { ConsumerRecords records = consumer.poll(100); for (ConsumerRecord record : records) { log.trace("Kafka消费信息ConsumerRecord={}",record.toString()); } try { consumer.commitAsync(); } catch (CommitFailedException e) { log.error("commitAsync failed", e); } finally { try { consumer.commitSync(); } catch (CommitFailedException e) { log.error("commitSync failed", e); } finally { consumer.close(); } } }Exception Handling
Uncaught exceptions during processing cause message retries. Kafka first performs local retries; if they fail, the message is placed in a server‑side retry queue. Proper exception handling prevents loss of subsequent messages.
JMQ treats uncaught exceptions as consumption failures, performing two local retries before moving the message to a retry queue that expires after a configurable period.
// Example of handling exceptions during polling
ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(60L));
for (ConsumerRecord<String, String> record : records) {
try {
// processing logic
} catch (Exception e) {
log.error("Bdp监听任务执行失败, taskName:{}", taskName, e);
}
}Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
