Understanding Kafka: Architecture, Message Integrity, and Performance Considerations
This article explains Kafka's role as a distributed message queue, covering its architecture, replication mechanisms, producer‑consumer workflow, message integrity guarantees, schema management, and performance tuning for high‑throughput, low‑latency backend systems.
Kafka is an open‑source message queue developed by LinkedIn; before Kafka, inter‑service communication relied on complex point‑to‑point connections.
Imagine using a very simple pipeline to aggregate such data in one step.
Data producers generate messages while consumers process them, and the pipeline between them functions as a message queue.
Kafka can also be used in database replication scenarios.
In this architecture, several key aspects must be ensured:
Message integrity:
No message loss
In‑order delivery
Exactly‑once semantics
Message schema:
Schema registry
Serialization / deserialization
Performance:
High throughput
Low latency
Handling large messages
Producer behavior:
Batching messages
Asynchronous sending
In the sender thread, messages are taken from the Record Accumulator’s buffer and sent to the broker.
For fail‑over, data must be replicated; all brokers in the ISR (in‑sync replica set) must acknowledge before the producer receives an ack.
The ISR mechanism works like UDP when ack=0, but setting ack=1 or -1 changes reliability guarantees, ensuring that messages are not lost even if the leader crashes before replication.
To guarantee in‑order delivery, the producer configuration max.in.flight.requests.per.connection=1 is required; otherwise retries can reorder messages.
The producer’s callback should set timeout to 0 and close the connection to avoid hanging.
Setting min.isr=2 ensures that at least two brokers keep the latest committed messages, providing redundancy.
Exactly‑once delivery is handled on the consumer side using a global epoch and a transition sequence number; consumers discard messages with smaller ETS values.
Enabling “acks=all” improves durability but increases latency.
Because Kafka relies on external storage, producers and consumers perform split and assembly of messages.
During consumer assembly, in‑order delivery is maintained by buffering messages and handling offsets at both the head and tail of the stream.
All messages passing through Kafka must follow a fixed format; a centralized schema registry assigns a schema ID, which is sent with the message. Consumers retrieve the schema by ID to deserialize the payload, keeping overhead low.
The above summarizes the key points from the LinkedIn sharing session.
Source: https://mp.weixin.qq.com/s/uSGmlk2OzryWfHv_enbwCw
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
