Big Data 6 min read

Understanding Apache Kafka: Features, Architecture, and Real‑World Use Cases

This article provides a comprehensive overview of Apache Kafka, covering its core features, architectural components, message flow, and common application scenarios such as log collection, decoupled messaging, activity tracking, operational monitoring, and stream processing.

Mike Chen's Internet Architecture

May 9, 2024

Understanding Apache Kafka: Features, Architecture, and Real‑World Use Cases

Kafka Overview

Apache Kafka is a distributed publish‑subscribe messaging system originally developed by LinkedIn and now an Apache top‑level project. It is written in Scala and Java and is widely used for log collection and messaging in large‑scale internet applications.

Kafka Features

1. High throughput, low latency: Kafka can handle hundreds of thousands of messages per second with latency as low as a few milliseconds.

2. Scalability: Kafka clusters support hot expansion, allowing seamless addition of nodes.

3. Durability & reliability: Messages are persisted to local disks and replicated to prevent data loss.

4. High concurrency: Thousands of clients can read and write simultaneously.

Kafka Architecture

The main components are:

Topic: Logical category for messages; each message must be assigned to a topic, and consumers subscribe to topics.

Partition: A topic can be split into multiple partitions to distribute load and increase throughput.

Producer: The client that publishes messages to Kafka.

Broker: A Kafka server (node) that stores partitions; a cluster consists of one or more brokers.

Consumer: The client that pulls messages from brokers for processing.

How Kafka Works

1. Message production: Producers publish messages to the appropriate broker.

2. Message consumption: Consumers pull messages from brokers.

3. Broker storage: Brokers persist messages on disk and replicate them for fault tolerance.

Kafka uses ZooKeeper for distributed coordination, managing leader election, partition assignment, and configuration synchronization.

Typical Application Scenarios

Log collection: Centralized gathering of logs from various services.

Message decoupling: Acting as a buffer between producers and consumers to improve system resilience.

User activity tracking: Recording web or app user actions such as page views, searches, and clicks.

Operational metrics: Collecting monitoring data, alerts, and reports from distributed applications.

Stream processing: Feeding real‑time processing frameworks like Spark Streaming or Storm.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Streaming Message Queue Apache Kafka

Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.