Backend Development 16 min read

Why Apache Kafka Beats Traditional Message Queues: Architecture, Code, and Performance

This article explains Apache Kafka's distributed publish‑subscribe design, core components, storage model, ZooKeeper coordination, performance benchmarks against ActiveMQ and RabbitMQ, and provides Java producer and consumer code examples for building high‑throughput messaging applications.

21CTO

Sep 14, 2015

Why Apache Kafka Beats Traditional Message Queues: Architecture, Code, and Performance

Introduction

Apache Kafka is a distributed publish‑subscribe messaging system originally developed at LinkedIn and now an Apache project. It is fast, scalable, partitioned and replicated.

Compared with traditional messaging systems, Kafka is designed as a distributed system that scales out easily, provides high throughput for both publishing and subscribing, supports multiple consumers with automatic load balancing, and persists messages to disk for batch and real‑time use.

Architecture

Key components are:

Topic : a category of messages; each message is a byte payload.

Producer : publishes messages to a topic.

Broker (Kafka cluster): stores published messages.

Consumer : subscribes to one or more topics and pulls data from brokers.

Producer can batch messages and choose a serializer. Example producer code:

producer = new Producer(...);
message = new Message("test message str".getBytes());
set = new MessageSet(message);
producer.send("topic1", set);

Consumer creates one or more streams per topic and iterates indefinitely. Example consumer code:

streams[] = Consumer.createMessageStreams("topic1", 1)
for (message : streams[0]) {
    bytes = message.payload();
    // process bytes
}

Kafka clusters consist of multiple brokers; topics are split into partitions distributed across brokers.

Storage

Each partition is a logical log backed by segment files. Messages are appended to the latest segment and flushed to disk based on size or time thresholds. Offsets, not explicit IDs, are used to locate messages.

Consumers read sequentially by offset; Kafka uses the sendfile API to transfer bytes efficiently.

Broker

Brokers are stateless; they do not track consumer offsets. Retention is time‑based, allowing automatic deletion of old data and enabling consumers to rewind to earlier offsets.

ZooKeeper and Kafka

ZooKeeper provides coordination for distributed systems. Kafka uses ZooKeeper to manage broker membership, leader election, and configuration changes. The ensemble consists of a leader and followers that replicate state.

Performance Comparison

LinkedIn benchmarked Kafka against ActiveMQ and RabbitMQ. Kafka showed higher throughput for both producers and consumers due to its batch sending, compact storage format (≈9 bytes overhead per message vs. 144 bytes for ActiveMQ), and use of the sendfile API.

Sample Application

A simplified Java application demonstrates Kafka producer and consumer APIs, reading email files from a directory and publishing them to a topic.

Producer example:

/** 
 * Instantiates a new Kafka producer. 
 * @param topic the topic 
 * @param directoryPath the directory path 
 */ 
public KafkaMailProducer(String topic, String directoryPath) { 
    props.put("serializer.class", "kafka.serializer.StringEncoder"); 
    props.put("metadata.broker.list", "localhost:9092"); 
    producer = new kafka.javaapi.producer.Producer<Integer, String>(new ProducerConfig(props)); 
    this.topic = topic; 
    this.directoryPath = directoryPath; 
}

Consumer example:

public KafkaMailConsumer(String topic) { 
    consumer = Kafka.consumer.Consumer.createJavaConsumerConnector(createConsumerConfig()); 
    this.topic = topic; 
} 
private static ConsumerConfig createConsumerConfig() { 
    Properties props = new Properties(); 
    props.put("zookeeper.connect", KafkaMailProperties.zkConnect); 
    props.put("group.id", KafkaMailProperties.groupId); 
    props.put("zookeeper.session.timeout.ms", "400"); 
    props.put("zookeeper.sync.time.ms", "200"); 
    props.put("auto.commit.interval.ms", "1000"); 
    return new ConsumerConfig(props); 
}

Conclusion

Kafka’s pull‑based consumption model, fault‑tolerant storage, and ability to re‑consume messages make it well‑suited for high‑volume, real‑time data pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Zookeeper Performance Benchmark Message Queue Producer Consumer Distributed Messaging Apache Kafka

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.