Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems
This article explores how distributed systems determine node liveness, manage failover and recovery, and implement at‑most‑once, at‑least‑once, and exactly‑once processing guarantees—including opaque transactions and two‑phase commit—using examples from Kafka, Zookeeper, and big‑data pipelines.
The article examines practical approaches to dealing with transactions in distributed systems, starting from how to judge whether a node is alive (heartbeat with Zookeeper, in‑sync replication) and moving on to failover and recovery strategies.
It explains that a node is considered alive when it can maintain a heartbeat with Zookeeper and, if it is a follower, faithfully replicate the leader’s data changes. Code examples illustrate heartbeat checks and node‑failure detection:
var timer = new timer() .setInterval(10sec) .onTime(slave-nodes,function(slave-nodes){ slave-nodes.forEach( node -> { boolean isAlive = node.heartbeatACK(15sec); if(!isAlive) { node.numNotAlive += 1; if(node.numNotAlive >= 3) { node.declareDeadOrFailed(); slave-nodes.remove(node); } } else { node.numNotAlive = 0; } }); }); timer.run();
Failover is described as straightforward when multiple slaves exist, while recovery is more complex because it must handle data requests that fail during node restart. The article outlines three processing guarantees:
At‑most‑once – data may be lost if a follower writes its offset before processing the payload and then crashes.
At‑least‑once – data may be duplicated when a follower crashes after processing but before writing its offset.
Exactly‑once – requires atomic updates of both offset and data, which is difficult without strong transactional support.
To achieve exactly‑once semantics, the article introduces the concept of an Opaque Transaction , where each batch of records shares a unique, ordered transaction ID. By storing the previous reach value (prevReach) alongside the current value, the system can safely apply updates even when transactions are retried:
// old data { transactionId:3, urlId:99, prevReach:2, reach:3 } // new data (same transactionId) { transactionId:3, urlId:99, reach:5 } // update using prevReach { transactionId:3, urlId:99, prevReach:2, reach:7 // 2 + 5 }
The article also presents a big‑data example: computing the “reach” of a Twitter URL by aggregating followers of all users who retweeted the URL. It shows SQL queries to fetch retweeters and their followers, then a Java‑like snippet that de‑duplicates followers using a hash map.
var url = queryArgs.getUrl(); var tweeters = getUrlToTweetersMap(); var result = new HashMap<String,Integer>(); tweeters.forEach(t -> { var followers = getFollowers(t.tweeter_id); followers.forEach(f -> { result.put(f.user_id,1); }); }); return result.size(); // Reach
For large‑scale processing, the article suggests using Storm/Trident to parallelise the computation, showing a topology that queries static states, expands lists, groups by follower, de‑duplicates, and finally counts the reach.
TridentState urlToTweeters = topology.newStaticState(getUrlToTweetersState()); TridentState tweetersToFollowers = topology.newStaticState(getTweeterToFollowersState()); topology.newDRPCStream("reach") .stateQuery(urlToTweeters, new Fields("args"), new MapGet(), new Fields("tweeters")) .each(new Fields("tweeters"), new ExpandList(), new Fields("tweeter")) .shuffle() .stateQuery(tweetersToFollowers, new Fields("tweeter"), new MapGet(), new Fields("followers")) .parallelismHint(200) .each(new Fields("followers"), new ExpandList(), new Fields("follower")) .groupBy(new Fields("follower")) .aggregate(new One(), new Fields("one")) .parallelismHint(20) .aggregate(new Count(), new Fields("reach"));
Finally, the article touches on two‑phase commit using Zookeeper for coordination, noting that when a database lacks atomic operations, a manual two‑phase commit can provide the needed consistency.
In summary, the piece provides a hands‑on overview of node health checks, failover/recovery, processing guarantees, opaque transactions, and coordination techniques essential for building reliable distributed systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
