Big Data 8 min read

What Is Kafka? Overview, Architecture, Features, Deployment, and Sample Code

This article explains Kafka as a distributed publish/subscribe messaging system, detailing its core functions, architecture, advantages, deployment methods, common use cases, and provides Java consumer and producer code examples for real‑time data processing.

Rare Earth Juejin Tech Community

Feb 8, 2024

What Is Kafka? Overview, Architecture, Features, Deployment, and Sample Code

Kafka, developed by the Apache Software Foundation, is a distributed, reliable publish/subscribe messaging system designed for real‑time and streaming data.

It moves data between systems in real time, supports both online and batch processing, and offers high throughput, low latency, and easy scalability across multiple nodes.

1. Basic Functions

Producer/Consumer: reliable message delivery service allowing clients to publish and consume messages.

Streams: enables processing and transformation of data streams within the Kafka cluster.

Connectors: connect Kafka to external systems for data transfer; Kafka is written in Scala and Java and runs on POSIX‑compatible operating systems.

2. Core Architecture

Kafka consists of three main components: Producer, Consumer, and Broker.

Producer : application that publishes messages to one or more topics.

Consumer : application that subscribes to topics and consumes messages.

Broker : a Kafka server instance that receives, stores, and forwards messages from producers to consumers.

Kafka provides a simple and reliable messaging service for real‑time data transfer between systems.

3. Implementation Concepts

Kafka’s implementation relies on two core concepts: the publish/subscribe model and partitioning.

Publish/Subscribe Model

Producers publish messages to topics; consumers subscribe to those topics to receive messages.

Partitioning

Messages are divided into multiple partitions, allowing parallel processing and scaling.

4. Advantages and Disadvantages

Advantages

Reliability : high‑throughput, low‑latency message delivery.

Scalability : supports many consumers and can expand by adding partitions.

Performance : handles large message volumes efficiently.

Disadvantages

Complexity : installation and configuration require solid networking and server infrastructure knowledge.

Latency : message latency can increase under heavy load.

Deployment Methods

Kafka can be deployed by installing the server and client applications, either via direct download or Docker containers, and by adding client libraries for languages such as Java, Scala, Python, Go, C#, and C++.

Applications

Kafka is used for real‑time data processing, batch processing, log tracking, and system monitoring.

Real‑time Data Processing

Streams data between systems and enables aggregation, statistics, and reporting.

Batch Processing

Partitions allow storing messages for later batch jobs.

Log Tracking

Captures event logs in real time for analysis.

Monitoring

Publishes metrics for real‑time monitoring and analysis.

Sample Code

Consumer

// Create Kafka consumer
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
Consumer<String, String> consumer = new KafkaConsumer<String, String>(props);
// Subscribe to topic
consumer.subscribe(Arrays.asList("my-topic"));
// Consume messages
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
    }
}
consumer.close();

Producer

// Create Kafka producer
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String, String>(props);
// Send messages
for (int i = 0; i < 10; i++) {
    String msg = "Message " + i;
    producer.send(new ProducerRecord<String, String>("my-topic", msg));
}
producer.close();

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Big Data Python Streaming Kafka Message Queue

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.