Big Data 8 min read

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

This article provides a comprehensive overview of Apache Kafka, covering its fundamental capabilities, typical use cases, core components, key APIs, and essential concepts such as topics, partitions, segments, brokers, producers, and consumers, illustrated with diagrams.

Architecture Digest
Architecture Digest
Architecture Digest
Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

What is Kafka

Kafka is a distributed streaming platform that offers three core capabilities: a publish‑subscribe record stream similar to a message queue, fault‑tolerant storage of those records, and real‑time processing of the streams.

Kafka Applications

Used as a messaging system

Used as a storage system

Used as a stream processor

Kafka can build reliable data pipelines between systems or applications and enable streaming data transmission and response.

Kafka as a Messaging System

When used as a messaging system, Kafka consists of three basic components:

Producer – the client that publishes messages

Broker – the server that receives and stores messages from producers

Consumer – the client that reads messages from brokers

In large systems, many subsystems need to exchange data; a message‑passing system like Kafka simplifies and organizes this interaction.

Kafka runs on one or more servers in one or more data centers as a cluster. The cluster stores messages in logical containers called topics . Each message record contains a key, a value, and a timestamp.

Core APIs

Kafka provides four core APIs:

Producer API – allows applications to send message records to one or more topics

Consumer API – allows applications to subscribe to topics and process the resulting record streams

Streams API – enables applications to act as stream processors, consuming input streams from topics and producing output streams

Connector API – enables building and running connectors that link Kafka topics to external systems such as relational databases

Fundamental Kafka Concepts

Topic

A topic is a logical category that groups related messages, similar to a table in a database or a folder in a file system.

Partition

Each topic is divided into one or more partitions, which are physical logs stored on disk. Messages are appended to partitions, and each partition preserves order.

Note: Because a topic may contain many partitions, global ordering across the entire topic cannot be guaranteed, but ordering is preserved within each individual partition.

Partitions can be distributed across multiple servers, allowing a topic to span several machines for higher performance.

Segment

Partitions are further broken into segments, each of which is a fixed‑size file on disk.

Broker

A Kafka cluster consists of one or more brokers (servers). Brokers receive messages from producers, assign offsets, persist them to disk, and serve consumer read requests. One broker in the cluster acts as the leader for each partition, handling replication and failover.

Producer

The producer publishes messages to a topic. By default, it distributes messages evenly across all partitions of the topic, though it can target specific partitions when needed.

Consumer

The consumer reads messages from one or more topics. For a given topic, a consumer reads messages from a specific partition, ensuring ordered consumption within that partition.

Source: SegmentFault https://segmentfault.com/a/1190000020718980
distributed systemsbig dataStreamingKafkaMessage QueueAPIs
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.