Big Data 18 min read

Kafka Overview: Core Concepts, Architecture, Configuration, and Usage in Real-Time Computing

This article provides a comprehensive technical overview of Kafka, covering its core concepts, producer and consumer models, architecture, configuration parameters, replication mechanisms, performance optimizations, operational monitoring, tooling scripts, and related product implementations for real-time data processing.

Manbang Technology Team
Manbang Technology Team
Manbang Technology Team
Kafka Overview: Core Concepts, Architecture, Configuration, and Usage in Real-Time Computing

Kafka is a distributed messaging middleware that decouples applications, offering low latency, high throughput, persistence, scalability, and ordering guarantees, and is widely used as a data bus in real-time computing scenarios.

Core concepts

Broker: a Kafka node that runs the service process; Topic: a named category of messages; Partition: a sub‑log of a topic stored on different brokers to enable parallel reads/writes; ISR (in‑sync replica): the subset of replicas that are fully caught up; HW (high‑water) and LEO (last‑end‑offset) define the range of data a consumer can read; Controller: a broker elected via ZooKeeper that manages metadata, leader election and partition state.

Producer model

Producers push data to Kafka; they obtain metadata from any broker, select a partition leader, and send messages through an accumulator and sender thread, supporting interceptors, serializers, partitioners and various configuration parameters such as acks, retries, batch.size, linger.ms, compression.type, etc.

Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("group.id", "test");
props.put("session.timeout.ms", "1000");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "10000");
KafkaConsumer consumer = new KafkaConsumer(props);
consumer.subscribe("foo", "bar");
boolean isRunning = true;
while (isRunning) {
    Map<String, ConsumerRecords> records = consumer.poll(100, TimeUnit.MILLISECONDS);
    process(records);
}
consumer.close();

Consumer model

Consumers pull data; they can subscribe to topics or manually assign partitions, and support methods such as poll, commit, seek, pause/resume. Consumer groups enable load‑balanced consumption, with a rebalance protocol that elects a group leader, assigns partitions, and handles failures.

Server side

The Kafka broker receives client requests via acceptor threads, places them in a request channel, and processes them asynchronously. Handlers cover produce, fetch, metadata, leader‑and‑ISR, offset commit/fetch, group coordination, etc. The network layer uses Java NIO with selector pools and event queues to achieve high concurrency.

Zero‑copy and I/O optimization

Kafka uses memory‑mapped files (mmap) and Java NIO transferTo/transferFrom to avoid extra data copies, reducing context switches and improving throughput.

Replication and fault tolerance

Replication ensures data safety; the ISR set balances safety and performance, while the controller manages leader election and replica state. Configuration parameters such as min.insync.replicas, unclean.leader.election.enable, and various broker settings tune reliability and performance.

Operational parameters

Typical broker, JVM and monitoring settings are listed, covering buffer sizes, log retention, GC options, and key metrics like leader elections, ISR changes, I/O rates, and CPU/disk usage.

Tools and scripts

Kafka provides command‑line utilities for topic management, broker start/stop, preferred‑replica election, partition reassignment, producer/consumer performance testing, and log inspection.

Products

The author’s team built a Kafka SDK with circuit‑breaker, degradation, rate‑limiting and monitoring features, and a metadata management platform for topic administration and permission handling.

Author

Dong Yanfeng, big‑data platform architect at Manbang Group.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureBig DataKafkaMessage QueueReal‑Time Computing
Manbang Technology Team
Written by

Manbang Technology Team

Manbang Technology Team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.