Backend Development 17 min read

Why Apache Kafka Outperforms Traditional Message Queues: Architecture & Use Cases

This article explains Apache Kafka’s distributed publish‑subscribe design, its core components, storage model, broker behavior, integration with ZooKeeper, performance comparisons with RabbitMQ and ActiveMQ, and provides a practical example application illustrating producer and consumer APIs.

Java Backend Technology

Aug 15, 2017

Why Apache Kafka Outperforms Traditional Message Queues: Architecture & Use Cases

Introduction

Apache Kafka is a distributed publish‑subscribe messaging system originally developed at LinkedIn and now an Apache project. It is fast, scalable, inherently distributed, partitioned, and provides a replicated commit‑log service.

Compared with traditional messaging systems, Kafka is designed to be horizontally scalable, offers high throughput for both publishing and subscribing, supports multiple consumers with automatic load balancing, and persists messages to disk for batch and real‑time processing.

This article focuses on Kafka’s architecture, features, and advantages over traditional services, and demonstrates a sample application that uses Kafka as a messaging server.

Architecture

Kafka’s basic concepts include:

Topic – a stream of messages of a specific type; a message is a byte payload and a topic is the name of the feed.

Producer – any client that publishes messages to a topic.

Broker – a server (or a cluster of servers) that stores published messages.

Consumer – subscribes to one or more topics and pulls data from brokers.

Producers can choose serialization methods and batch multiple messages in a single request. The following image shows a producer example.

Consumers create one or more streams for a topic, receive messages in a balanced fashion, and iterate over an endless stream that blocks when no data is available. Kafka supports both point‑to‑point and publish‑subscribe delivery models. The following image shows a consumer example.

The overall architecture is illustrated in the next diagram, showing multiple brokers, partitions, and concurrent producers and consumers.

Kafka Storage

Each partition of a topic maps to a logical log, which is stored as a series of segment files on disk. Producers append messages to the latest segment; when a segment reaches a size or time threshold it is flushed to disk and becomes visible to consumers.

Unlike traditional systems, Kafka does not assign explicit message IDs. Instead, messages are identified by their offset within the log, eliminating the need for separate indexing structures. Consumers read messages sequentially by offset, and Kafka uses the sendfile API to efficiently transfer bytes from broker segment files to consumers.

Kafka Brokers

Kafka brokers are stateless; they do not track consumer offsets. Consumers maintain their own state, allowing them to rewind to older offsets and reprocess data, a design that provides flexibility not found in traditional queues.

Message deletion is time‑based: a simple SLA‑driven retention policy automatically removes old data.

This enables consumers to replay data by seeking earlier offsets.

ZooKeeper and Kafka

ZooKeeper provides distributed coordination, leader election, and configuration management for Kafka clusters. An ensemble of ZooKeeper servers maintains a hierarchical namespace (znodes) that Kafka uses to register brokers, topics, and partitions.

When a broker joins or fails, ZooKeeper notifies producers and consumers so they can adjust their connections accordingly. The overall system architecture is shown below.

Apache Kafka vs Other Messaging Services

LinkedIn conducted performance experiments comparing Kafka with Apache ActiveMQ 5.4 and RabbitMQ 2.4. They ran a producer test publishing 10 million 200‑byte messages, using batch sizes of 1 and 50 for Kafka. Results showed Kafka achieving far higher throughput.

Key reasons for Kafka’s superior performance:

Kafka sends messages as fast as the broker can accept them without waiting for acknowledgments.

Kafka’s storage format is more efficient; each message adds only about 9 bytes of overhead versus 144 bytes for ActiveMQ.

Consumer tests also favored Kafka. Consumers pulled 10 million messages with similar pre‑fetch settings. Kafka’s lower byte transfer per message and lack of per‑message state management on the broker contributed to its advantage.

Kafka’s efficient storage reduces the amount of data sent to consumers.

ActiveMQ and RabbitMQ must maintain per‑message transmission state, incurring additional disk I/O; Kafka avoids this by using sendfile.

Example Application

The sample application demonstrates basic Kafka producer and consumer APIs. It reads email files from a directory, publishes them to a topic, and consumes them for further processing. The architecture diagram below shows the components and their interactions.

Producer code (shown as an image) configures properties such as the target topic, serializer class, and broker list, then watches a directory for new email dump files and publishes them.

Consumer code (also an image) sets up a stream, iterates over incoming messages, and prints them to the console or forwards them to a parsing system.

In the author’s production environment, Kafka replaces a JMS queue, providing higher throughput, reliable message replay, and better handling of large attachments for OTC pricing data.

Summary

Kafka is a modern system for handling massive data streams. Its pull‑based consumption model lets consumers process messages at their own pace, and its ability to re‑consume messages enables robust error handling and flexible downstream processing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Development message queues Streaming Apache Kafka

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.