Kafka Overview: Core Concepts, Architecture, Configuration, and Usage in Real-Time Computing
This article provides a comprehensive technical overview of Kafka, covering its core concepts, producer and consumer models, architecture, configuration parameters, replication mechanisms, performance optimizations, operational monitoring, tooling scripts, and related product implementations for real-time data processing.
Kafka is a distributed messaging middleware that decouples applications, offering low latency, high throughput, persistence, scalability, and ordering guarantees, and is widely used as a data bus in real-time computing scenarios.
Core concepts
Broker: a Kafka node that runs the service process; Topic: a named category of messages; Partition: a sub‑log of a topic stored on different brokers to enable parallel reads/writes; ISR (in‑sync replica): the subset of replicas that are fully caught up; HW (high‑water) and LEO (last‑end‑offset) define the range of data a consumer can read; Controller: a broker elected via ZooKeeper that manages metadata, leader election and partition state.
Producer model
Producers push data to Kafka; they obtain metadata from any broker, select a partition leader, and send messages through an accumulator and sender thread, supporting interceptors, serializers, partitioners and various configuration parameters such as acks, retries, batch.size, linger.ms, compression.type, etc.
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("group.id", "test");
props.put("session.timeout.ms", "1000");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "10000");
KafkaConsumer consumer = new KafkaConsumer(props);
consumer.subscribe("foo", "bar");
boolean isRunning = true;
while (isRunning) {
Map<String, ConsumerRecords> records = consumer.poll(100, TimeUnit.MILLISECONDS);
process(records);
}
consumer.close();Consumer model
Consumers pull data; they can subscribe to topics or manually assign partitions, and support methods such as poll, commit, seek, pause/resume. Consumer groups enable load‑balanced consumption, with a rebalance protocol that elects a group leader, assigns partitions, and handles failures.
Server side
The Kafka broker receives client requests via acceptor threads, places them in a request channel, and processes them asynchronously. Handlers cover produce, fetch, metadata, leader‑and‑ISR, offset commit/fetch, group coordination, etc. The network layer uses Java NIO with selector pools and event queues to achieve high concurrency.
Zero‑copy and I/O optimization
Kafka uses memory‑mapped files (mmap) and Java NIO transferTo/transferFrom to avoid extra data copies, reducing context switches and improving throughput.
Replication and fault tolerance
Replication ensures data safety; the ISR set balances safety and performance, while the controller manages leader election and replica state. Configuration parameters such as min.insync.replicas, unclean.leader.election.enable, and various broker settings tune reliability and performance.
Operational parameters
Typical broker, JVM and monitoring settings are listed, covering buffer sizes, log retention, GC options, and key metrics like leader elections, ISR changes, I/O rates, and CPU/disk usage.
Tools and scripts
Kafka provides command‑line utilities for topic management, broker start/stop, preferred‑replica election, partition reassignment, producer/consumer performance testing, and log inspection.
Products
The author’s team built a Kafka SDK with circuit‑breaker, degradation, rate‑limiting and monitoring features, and a metadata management platform for topic administration and permission handling.
Author
Dong Yanfeng, big‑data platform architect at Manbang Group.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
