Backend Development 6 min read

Understanding Kafka: Architecture, Message Integrity, and Performance Considerations

This article explains Kafka's role as a distributed message queue, covering its architecture, replication mechanisms, producer‑consumer workflow, message integrity guarantees, schema management, and performance tuning for high‑throughput, low‑latency backend systems.

Architecture Digest

Jun 19, 2018

Understanding Kafka: Architecture, Message Integrity, and Performance Considerations

Kafka is an open‑source message queue developed by LinkedIn; before Kafka, inter‑service communication relied on complex point‑to‑point connections.

Imagine using a very simple pipeline to aggregate such data in one step.

Data producers generate messages while consumers process them, and the pipeline between them functions as a message queue.

Kafka can also be used in database replication scenarios.

In this architecture, several key aspects must be ensured:

Message integrity:

No message loss

In‑order delivery

Exactly‑once semantics

Message schema:

Schema registry

Serialization / deserialization

Performance:

High throughput

Low latency

Handling large messages

Producer behavior:

Batching messages

Asynchronous sending

In the sender thread, messages are taken from the Record Accumulator’s buffer and sent to the broker.

For fail‑over, data must be replicated; all brokers in the ISR (in‑sync replica set) must acknowledge before the producer receives an ack.

The ISR mechanism works like UDP when ack=0, but setting ack=1 or -1 changes reliability guarantees, ensuring that messages are not lost even if the leader crashes before replication.

To guarantee in‑order delivery, the producer configuration max.in.flight.requests.per.connection=1 is required; otherwise retries can reorder messages.

The producer’s callback should set timeout to 0 and close the connection to avoid hanging.

Setting min.isr=2 ensures that at least two brokers keep the latest committed messages, providing redundancy.

Exactly‑once delivery is handled on the consumer side using a global epoch and a transition sequence number; consumers discard messages with smaller ETS values.

Enabling “acks=all” improves durability but increases latency.

Because Kafka relies on external storage, producers and consumers perform split and assembly of messages.

During consumer assembly, in‑order delivery is maintained by buffering messages and handling offsets at both the head and tail of the stream.

All messages passing through Kafka must follow a fixed format; a centralized schema registry assigns a schema ID, which is sent with the message. Consumers retrieve the schema by ID to deserialize the payload, keeping overhead low.

The above summarizes the key points from the LinkedIn sharing session.

Source: https://mp.weixin.qq.com/s/uSGmlk2OzryWfHv_enbwCw

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Backend Development kafka replication Message Queue

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.