How Kafka Guarantees Data Reliability and Consistency
This article explains Kafka's mechanisms for ensuring data reliability through partition replicas, producer acknowledgments, and leader election, and describes how the high‑water‑mark and ISR concepts maintain strong data consistency across the cluster.
Data Reliability
Kafka is a commercial‑grade distributed messaging system, and its reliability is crucial. This article examines reliability from the perspectives of producer‑to‑broker message flow, topic partition replicas, and leader election.
Topic Partition Replicas
Before version 0.8.0 Kafka had no replica concept, so it was used only for non‑critical data. Starting with 0.8.0, Kafka introduced partition replicas (see KAFKA‑50). Each partition can be configured with a replication factor, typically 3.
Within a partition, one replica acts as the Leader and the others as Followers. All read and write operations go through the Leader, while Followers continuously replicate data from it. If the Leader fails, a follower is promoted to Leader, providing redundancy and reliability.
Producer Sending Messages to Broker
Producers send messages to Kafka topics, which consist of multiple partitions and replicas. Kafka offers a message acknowledgment mechanism that can be configured via the acks parameter (previously request.required.acks , see KAFKA‑3043). The three possible values are:
acks = 0 : The producer assumes success as soon as the message leaves the client, offering the highest throughput but risking message loss.
acks = 1 : The Leader acknowledges after writing to its log (not necessarily flushed to disk). Data may still be lost if the Leader crashes before replication.
acks = all (or request.required.acks = -1 ): The Leader waits for all in‑sync replicas to acknowledge before responding, optionally combined with min.insync.replicas to require a minimum number of replicas. This provides the strongest durability but lowest throughput.
Depending on the use case, different acks settings are chosen to balance reliability and performance. Additionally, producers can operate in synchronous ( producer.type=sync ) or asynchronous ( producer.type=async ) mode; synchronous mode is required for maximum reliability.
Leader Election
Kafka maintains an ISR (in‑sync replica) list for each partition, containing followers that are caught up with the Leader. Only replicas in the ISR are eligible to become the new Leader when the current Leader fails (provided unclean.leader.election.enable=false ).
When a Leader crashes, Kafka selects the first follower in the ISR as the new Leader, ensuring that only messages that have been committed by all in‑sync replicas are retained, thus preserving data reliability.
To achieve reliable data storage, the following configurations are recommended:
Producer level: acks=all (or request.required.acks=-1 ) and synchronous mode producer.type=sync .
Topic level: replication.factor>=3 and min.insync.replicas>=2 .
Broker level: disable unclean leader election with unclean.leader.election.enable=false .
Data Consistency
Consistency means that both the old Leader and any newly elected Leader present the same data to consumers. Kafka achieves this using the High Water Mark (HWM) and ISR mechanisms.
Assume a partition has three replicas (0 as Leader, 1 and 2 as Followers) all in the ISR. If the Leader has written Message4 but only Message2 has been replicated to all ISR members, the HWM is at Message2, so consumers can only read up to Message2. Messages above the HWM are considered unsafe because they may be lost if the Leader fails before replication.
This design ensures that consumers never read messages that might disappear after a leader change, preserving strong consistency at the cost of potential latency, which can be tuned via replica.lag.time.max.ms .
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.