Big Data 8 min read

What Is Kafka? Overview, Architecture, Features, Deployment, and Sample Code

Kafka, an Apache‑developed distributed publish/subscribe messaging system, provides reliable, high‑throughput real‑time data streaming with producers, consumers, brokers, streams, and connectors, and the article explains its core concepts, architecture, advantages, deployment methods, use cases, and includes Java code examples for producers and consumers.

Rare Earth Juejin Tech Community

Jan 13, 2024

What Is Kafka

Kafka is an Apache Software Foundation project that implements a distributed, reliable publish/subscribe messaging system for real‑time and stream data. It moves data from one system to another, supporting both online and offline processing.

Kafka offers high‑throughput, low‑latency message transport, can scale across many nodes, and runs on various POSIX‑compatible operating systems.

1. Core Features

Producers/Consumers – reliable message delivery allowing applications to publish and consume messages.

Streams – processing and transformation of data streams within the Kafka cluster.

Connectors – integration with external systems for data flow in and out of Kafka.

2. Basic Architecture

Kafka consists of three main components: producers, consumers, and brokers.

Producer – an application that publishes messages to one or more topics.

Consumer – an application that subscribes to topics and consumes messages.

Broker – a Kafka server instance that receives, stores, and forwards messages.

Kafka provides a simple yet reliable messaging service for real‑time data transfer between systems.

3. Implementation Concepts

Kafka relies on two core concepts: the publish/subscribe model and partitioning.

Publish/Subscribe Model

Producers publish messages to topics; consumers subscribe to those topics to receive messages.

Partitioning

Messages are divided into partitions, enabling parallel processing and scalability.

4. Advantages and Disadvantages

Advantages

Reliability – high throughput and low latency message delivery.

Scalability – can add partitions to increase capacity.

Performance – supports large numbers of consumers with fast processing.

Disadvantages

Complexity – requires solid networking and server infrastructure and technical expertise to install and configure.

Latency – may experience higher latency under heavy load.

5. Deployment Methods

Kafka can be deployed by installing the server and client applications.

Install Kafka server – via binary download or Docker container.

Install client libraries – supports Java, Scala, Python, Go, C#, C++, etc.

6. Applications

Kafka is used for real‑time data processing, batch processing, log aggregation, and monitoring.

Real‑time Data Processing

Streams data between systems and enables aggregation, statistics, and reporting.

Batch Processing

Partitions allow storing messages for later batch jobs.

Log Tracking

Captures event logs in real time for analysis.

Monitoring

Publishes metrics for real‑time monitoring and analysis.

7. Sample Use Case: Real‑time Data Processing

Consumer Example

// Create Kafka consumer
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
Consumer<String, String> consumer = new KafkaConsumer<String, String>(props);
// Subscribe to topic
consumer.subscribe(Arrays.asList("my-topic"));
// Consume messages
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
        System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
    }
}
consumer.close();

Producer Example

// Create Kafka producer
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String, String>(props);
// Publish messages
for (int i = 0; i < 10; i++) {
    String msg = "Message " + i;
    producer.send(new ProducerRecord<String, String>("my-topic", msg));
}
producer.close();

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Big Data Streaming Kafka Message Queue

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.