Big Data 15 min read

Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems

This article explores practical techniques for handling node liveness, failover, recovery, and exactly‑once transaction semantics in distributed systems, illustrating implementations with Zookeeper, Kafka, Storm, and database sharding while addressing big‑data reach calculations and performance trade‑offs.

Architecture Digest

May 14, 2017

Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems

The article discusses the challenges of implementing transactions in large‑scale distributed systems, focusing on how to determine node liveness, manage failover and recovery, and achieve exactly‑once processing guarantees.

In a Kafka‑style cluster a node is considered alive when it can maintain a heartbeat session with Zookeeper and faithfully replicate data from the leader. The following code shows a simple heartbeat‑based liveness check:

var timer = new timer()
    .setInterval(10sec)
    .onTime(slave-nodes, function(slave-nodes) {
        slave-nodes.forEach(node -> {
            boolean isAlive = node.heartbeatACK(15sec);
            if (!isAlive) {
                node.numNotAlive += 1;
                if (node.numNotAlive >= 3) {
                    node.declareDeadOrFailed();
                    slave-nodes.remove(node);
                    // callback: leader-node-app.notifyNodeDeadOrFailed(node)
                }
            } else {
                node.numNotAlive = 0;
            }
        });
    });

timer.run();

A leader node can similarly detect dead slaves:

var timer = new timer()
    .setInterval(10sec)
    .onTime(slave-nodes, function(slave-nodes) {
        slave-nodes.forEach(node -> {
            if (node.isDeadOrFailed) {
                // node cannot talk to Zookeeper
            }
        });
    });

timer.run();

When a node fails, the system must decide how to handle failover. If only slaves fail, reads continue; if the master fails, solutions such as keep‑alived, LVS, or a custom failover mechanism are required. High‑availability (HA) architectures are mentioned but not detailed.

Recovering from failures involves more than restarting a slave. The article examines strategies for handling failed read/write requests, including retry, replay, or ignoring the error, and stresses the importance of understanding the exact recovery logic.

A concrete big‑data problem is introduced: computing the "reach" of a URL on Twitter within a three‑hour window. The proposed solution extracts all users who retweeted the URL, gathers their followers, deduplicates the set, and counts the unique users.

Sample SQL and Java‑style code illustrate how to fetch retweeters and their followers:

// 1. Get retweeters of a URL
List<String> getUrlToTweetersMap(String url_id) {
    // SELECT user_id FROM url_user WHERE url_id = ${url_id}
    return [...] ;
}

// 2. Get followers of a retweeter
List<String> getFollowers(String tweeter_id) {
    // SELECT id FROM users WHERE followee_id = ${tweeter_id}
    return [...] ;
}

// 3. Compute reach
Map<String,Integer> result = new HashMap<>();
for (Tweeter t : tweeters) {
    List<String> followers = getFollowers(t.tweeter_id);
    for (String f : followers) {
        result.put(f, 1); // hash‑based deduplication
    }
}
int reach = result.size();

To scale this computation, the article suggests using Apache Storm/Trident, showing a topology that queries the two static states (URL→tweeters and tweeter→followers), expands lists, shuffles, groups by follower, and aggregates to obtain the unique count.

// Storm topology for reach calculation
TridentState urlToTweeters = topology.newStaticState(getUrlToTweetersState());
TridentState tweetersToFollowers = topology.newStaticState(getTweeterToFollowersState());

topology.newDRPCStream("reach")
    .stateQuery(urlToTweeters, new Fields("args"), new MapGet(), new Fields("tweeters"))
    .each(new Fields("tweeters"), new ExpandList(), new Fields("tweeter"))
    .shuffle()
    .stateQuery(tweetersToFollowers, new Fields("tweeter"), new MapGet(), new Fields("followers"))
    .parallelismHint(200)
    .each(new Fields("followers"), new ExpandList(), new Fields("follower"))
    .groupBy(new Fields("follower"))
    .aggregate(new One(), new Fields("one"))
    .parallelismHint(20)
    .aggregate(new Count(), new Fields("reach"));

The discussion then shifts to exactly‑once processing semantics. Three models are compared: at‑most‑once (possible data loss), at‑least‑once (possible duplicates), and exactly‑once (requires transactional guarantees). The article explains how transaction IDs, idempotent batches, and ordering can be used to achieve opaque transactions, where each batch carries a unique ID and a previous‑state field (prevReach) to resolve conflicts.

// Example of an opaque transaction update
// Old record
{ transactionId:3, urlId:99, prevReach:2, reach:3 }
// New incoming record with higher transactionId
{ transactionId:4, urlId:99, reach:5 }
// Update using prevReach from stored record
newReach = prevReach + reach; // 2 + 5 = 7 (if IDs equal) or 3 + 5 = 8 (if ID larger)

Finally, the article mentions that two‑phase commit can be implemented with Zookeeper when the underlying database lacks atomic operations, concluding with a reference link.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Big Data transactions Exactly-once

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.