Big Data 9 min read

An Overview of Kafka: Introduction, Design Principles, and Common Issues

This article introduces Kafka, explains its core concepts and design principles, outlines typical use cases, and discusses common operational problems and troubleshooting tips for this high‑throughput distributed messaging system.

360 Quality & Efficiency
360 Quality & Efficiency
360 Quality & Efficiency
An Overview of Kafka: Introduction, Design Principles, and Common Issues

Kafka is undoubtedly the most widely used distributed real‑time messaging system in internet companies today, offering high throughput and reliability for massive concurrent request processing. This article examines Kafka from three perspectives: an introduction and applicable scenarios, its design principles, and common issues with reflections.

1. Kafka Introduction and Use Cases

According to the official website, Kafka is a distributed publish‑subscribe messaging system characterized by high throughput, persistence (messages stored on disk with batch and real‑time processing), scalability, and fault tolerance.

Key Terminology:

Message : The basic data unit (key‑value pair). Messages are written to Kafka in batches belonging to the same Topic and Partition.

Topic : A category for messages, similar to a database table or folder; messages of different topics are stored separately.

Partition : A physical subdivision of a topic; messages are evenly distributed across partitions and appended in order, guaranteeing FIFO order within a partition.

Offset : A monotonically increasing integer that uniquely identifies a message within a partition.

Broker : A server in the Kafka cluster that stores messages; it does not maintain message state.

Producer : Publishes messages to brokers.

Consumer : Reads messages from brokers.

Consumer Group : A set of consumers sharing the consumption of partitions; each partition is consumed by only one consumer within the group.

Kafka Use Cases

Push notifications

High‑throughput data ingestion

Large buffering layer

Storing messages for offline analysis in Hadoop or traditional data warehouses

NGINX log collection

2. Kafka Design Principles

A simple Kafka cluster architecture is shown below:

Producers push data to brokers, which host multiple topics; consumers pull data from brokers. The following mechanisms are noteworthy:

Message‑Partition Assignment : If a key is present, its hash modulo the number of partitions determines the target partition, ensuring the same key always maps to the same partition; if the key is null, round‑robin distribution is used.

Message Retention Policy : Kafka retains messages either for a configured time period or until the log reaches a configured size.

Message Processing :

Messages are appended to the log in the order they are received.

Consumers read messages in the same order as they appear in the log.

With a replication factor of N, up to N‑1 broker failures can be tolerated.

Kafka provides at‑least‑once delivery; duplicate deliveries may occur after consumer crashes.

Only one consumer in a consumer group reads from a given partition, guaranteeing ordered consumption within that partition.

Acknowledgments (acks) are returned to confirm successful consumption.

3. Common Issues and Reflections

Why don’t brokers push messages and mark them as consumed? Pull‑based consumption allows clients to control processing speed and offset, enabling replay or selective consumption.

Why can only one consumer in a group read from a partition? Offsets are maintained per consumer group, preventing concurrent consumption of the same partition.

Why might a sent message not be received?

Check that producer and consumer use the same topic and cluster.

Verify the partition assignment.

Ensure the message was actually written to the broker log (inspect timestamps).

Confirm producer batching settings and message format.

Check consumer offset configuration and pull logic.

4. Summary

The text briefly described Kafka’s fundamentals and principles; in practice, enterprises encounter many additional challenges. The article concludes with a practical summary of Kafka’s usage for sharing and reference.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBig Datamessage queuesKafka
360 Quality & Efficiency
Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.