Big Data 39 min read

Apache Kafka Overview: Architecture, Features, and Usage

This article provides a comprehensive introduction to Apache Kafka, covering its high‑throughput distributed architecture, core concepts such as topics, partitions, brokers, producers and consumers, design goals, performance characteristics, deployment steps, configuration, and example code for producers, consumers, and Spring Boot integration.

Qunar Tech Salon

Mar 19, 2020

Apache Kafka Overview: Architecture, Features, and Usage

Apache Kafka is a fast, scalable, high‑throughput, fault‑tolerant distributed publish‑subscribe messaging system written in Scala and Java. It offers O(1) message persistence, supports millions of messages per second, partitioned storage, replication, and both online and offline processing.

Key design goals include constant‑time persistence, high throughput, ordered delivery within partitions, support for both streaming and batch processing, and horizontal scalability.

Typical use cases are real‑time data pipelines and stream processing applications, such as user activity tracking, log collection, rate limiting, and high‑throughput data ingestion.

Kafka’s core components are:

Topic – a logical category of messages.

Partition – an ordered log segment within a topic; each partition is stored on a broker.

Segment – a fixed‑size file inside a partition.

Broker – a server that hosts partitions.

Producer – publishes records to topics, optionally specifying a key for routing.

Consumer – reads records; consumers can belong to a consumer group to share load and guarantee each message is processed once.

Consumer Group – coordinates partition assignment among its members.

Leader / Follower – each partition has a leader that handles reads/writes; followers replicate the leader.

ISR (In‑Sync Replicas) – the set of replicas that are fully caught up with the leader.

HW (High Watermark) and LEO (Log End Offset) – offsets used for durability and consumption control.

Message flow: a producer discovers the partition leader via Zookeeper, sends the record to the leader, the leader writes to its local log and replicates to followers, followers acknowledge, the leader updates HW and acknowledges the producer.

Routing strategies: if a partition is specified, the record goes there; otherwise the key’s hash modulo partition count is used; if no key, round‑robin selection occurs.

Reliability settings (acks): 0 – fire‑and‑forget, 1 – leader acknowledgment, -1 (all) – all ISR replicas must acknowledge before the producer receives an ack.

Consumer offset commit modes: automatic (periodic) or manual (synchronous commitSync() or asynchronous commitAsync() with optional callback). Manual commits avoid duplicate processing caused by consumer failures.

Duplicate‑consumption mitigation strategies include idempotent processing, unique message IDs with deduplication tables, database unique constraints, and version‑based optimistic concurrency.

Cluster deployment steps (single‑machine example): download Kafka, extract, configure server.properties for each broker (different broker.id and listeners), create log directories, start Zookeeper, start each broker, create topics with kafka-topics.sh, and use kafka-console-producer.sh / kafka-console-consumer.sh for testing.

Example Java producer using the native Kafka API:

Properties props = new Properties();
props.put("bootstrap.servers", "192.168.51.128:9092,192.168.51.128:9093,192.168.51.128:9094");
props.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<Integer, String> producer = new KafkaProducer<>(props);
ProducerRecord<Integer, String> record = new ProducerRecord<>("test2", 0, 1, "hello world");
producer.send(record);
producer.close();

Example Java consumer with manual async commit and callback:

Properties props = new Properties();
props.put("bootstrap.servers", "192.168.51.128:9092,192.168.51.128:9093,192.168.51.128:9094");
props.put("group.id", "mygroup");
props.put("enable.auto.commit", "false");
props.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<Integer, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("test2"));
while (true) {
    ConsumerRecords<Integer, String> records = consumer.poll(1000);
    for (ConsumerRecord<Integer, String> rec : records) {
        System.out.println("topic=" + rec.topic() + ", key=" + rec.key() + ", value=" + rec.value());
    }
    consumer.commitAsync((offsets, e) -> {
        if (e != null) System.err.println("Commit failed: " + e);
    });
}

Spring Boot integration: add spring-kafka dependency, configure bootstrap servers, producer/consumer properties in application.properties, inject KafkaTemplate for sending messages, and annotate a method with @KafkaListener(topics = "${kafka.topic1}") to receive messages.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Big Data Kafka Message Queue

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.