Big Data 7 min read

Understanding Apache Kafka: Architecture, Core Principles, and Use Cases

This article introduces Apache Kafka as a fast, scalable distributed publish‑subscribe system, explains its core components, Zookeeper coordination, startup workflow, key features, and common scenarios such as log collection, activity tracking, and stream processing.

Java Backend Technology

Aug 16, 2017

Understanding Apache Kafka: Architecture, Core Principles, and Use Cases

1. Introduction

Apache Kafka is a distributed publish‑subscribe messaging system originally developed by LinkedIn and contributed to the Apache Foundation in 2010. It provides a fast, scalable, partitioned and replicated commit log service.

2. Basic Architecture

The main components are:

Topic – a category or feed name to which messages are published.

Producer – any entity that publishes messages to a topic.

Broker – a server that stores published messages; a Kafka cluster consists of multiple brokers.

Consumer – an entity that subscribes to one or more topics and pulls data from brokers.

The diagram shows producers sending data to brokers, brokers holding multiple topics, and consumers pulling data from brokers.

3. Core Principles

Producers publish data to brokers, which store it; consumers pull data from brokers for processing. The system is distributed: producers, brokers, and consumers can run on separate machines and coordinate via Zookeeper.

4. Role of Zookeeper

Zookeeper stores meta‑information for the Kafka cluster and coordinates producers, consumers and brokers, enabling high availability, subscription management and load balancing.

5. Execution Process

Typical startup sequence:

Start Zookeeper servers.

Start Kafka broker servers.

Producers discover brokers through Zookeeper and send messages.

Consumers discover brokers through Zookeeper and pull messages.

6. Kafka Features

High throughput and low latency (hundreds of thousands of messages per second, millisecond latency).

Scalability – supports hot‑scale‑out of clusters.

Durability and reliability – messages are persisted to disk with replication.

Fault tolerance – can tolerate node failures.

High concurrency – thousands of clients can read/write simultaneously.

Supports both real‑time stream processing (e.g., Storm, Spark Streaming) and batch processing (e.g., Hadoop).

7. Typical Use Cases

Log collection and centralisation.

Message decoupling between producers and consumers.

User activity tracking for web or app interactions.

Operational metrics aggregation.

Streaming processing pipelines (Spark Streaming, Storm).

Event sourcing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture Streaming Zookeeper Distributed Messaging

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.