Inside Xline Membership Change: A Source Code Walkthrough
This article examines how Xline performs safe cluster membership changes, comparing Joint Consensus and single‑step approaches, and provides a detailed source‑code walkthrough of leader and follower handling, configuration commits, node addition, removal, and the integration with the Curp protocol.
Background
In distributed systems, nodes may need to be added, removed, or replaced. Shutting down the whole cluster, editing a configuration file, and restarting causes downtime and manual errors, which is unacceptable for high‑availability services.
Raft Leader Constraint
Xline uses Raft as its backend consensus protocol. Raft requires a single leader at any time. Adding a node without coordination can break this invariant, allowing two servers to each obtain a quorum and become leaders simultaneously, as illustrated by the election race diagram.
Raft‑Provided Solutions
The Raft paper defines two mechanisms for safe membership changes:
Joint Consensus
Single‑step (one‑by‑one) changes
Joint Consensus
Joint Consensus inserts an intermediate configuration during the change. The leader creates a joint configuration entry, replicates it via AppendEntries, and both the old and new configurations must agree before the change is considered successful. After the joint entry is committed, the leader creates a new configuration entry and replicates it.
The intermediate states prevent two leaders:
Before the joint entry is committed, both old and new configurations may exist, but any candidate for leader must obtain votes from both configurations, preventing a split‑brain.
After the joint entry is committed but before the new entry, the cluster may have two configurations, yet only nodes that have already switched to the new configuration can obtain a majority, so a second leader cannot form.
During a transition where three configurations coexist, the middle configuration cannot elect a leader because it lacks a majority, and the other two configurations still require consensus from a majority of nodes.
When the commit phase finishes, the old configuration is discarded, leaving a single, consistent configuration.
Single‑step Membership Change
The alternative changes the cluster one node at a time. Each step adds or removes a single node, ensuring that the old and new configurations always share a majority of nodes, which guarantees that only one leader can exist. Complex changes are expressed as a series of single‑step operations.
Integration with Curp
Membership changes affect the front‑end Curp protocol. Curp clients broadcast Propose requests to all nodes and consider a request committed when the number of successful responses exceeds the super‑quorum. After a membership change, the super‑quorum may increase, causing previously committed requests to become invalid. Xline adds a cluster_version field to client requests; each membership change increments this version. Servers reject stale requests, prompting clients to fetch the latest configuration and retry.
Leader‑Side Handling
When a leader receives a ProposeConfChangeRequest, it first checks the cluster_version. If it matches, the leader validates the change via check_new_config, appends a configuration entry to the log, and records a fallback context for potential rollback. The new configuration is applied immediately without waiting for commit, mirroring the Raft paper’s approach.
pub(super) fn handle_propose_conf_change(&self, propose_id: ProposeId, conf_changes: Vec<ConfChange>) -> Result<(), CurpError> {
self.check_new_config(&conf_changes)?;
let entry = log_w.push(st_r.term, propose_id, conf_changes.clone())?;
debug!("{} gets new log[{}]", self.id(), entry.index);
let (addrs, name, is_learner) = self.apply_conf_change(conf_changes);
self.ctx.last_conf_change_idx.store(entry.index, Ordering::Release);
let _ig = log_w.fallback_contexts.insert(
entry.index,
FallbackContext::new(Arc::clone(&entry), addrs, name, is_learner),
);
// ...
}Follower‑Side Handling
Followers process configuration changes inside handle_append_entries. When new entries arrive, the follower separates normal log entries from configuration entries, applies any pending fallback contexts for overwritten configuration entries, and then applies the new configuration entries, recording their fallback contexts for possible rollback.
pub(super) fn handle_append_entries(&self, term: u64, leader_id: ServerId, prev_log_index: LogIndex, prev_log_term: u64, entries: Vec<LogEntry<C>>, leader_commit: LogIndex) -> Result<u64, (u64, LogIndex)> {
let (cc_entries, fallback_indexes) = log_w.try_append_entries(entries, prev_log_index, prev_log_term)?;
for idx in fallback_indexes.iter().sorted().rev() {
let info = log_w.fallback_contexts.remove(idx).unwrap();
if let EntryData::ConfChange(ref conf_change) = info.origin_entry.entry_data {
let changes = conf_change.clone();
self.fallback_conf_change(changes, info.addrs, info.name, info.is_learner);
}
}
for e in cc_entries {
if let EntryData::ConfChange(ref cc) = e.entry_data {
let (addrs, name, is_learner) = self.apply_conf_change(cc.clone());
let _ig = log_w.fallback_contexts.insert(e.index, FallbackContext::new(Arc::clone(&e), addrs, name, is_learner));
}
}
// ...
}Commit Phase and Node Shutdown
After a configuration change is committed, the system checks whether the change removes the current node. If so, the node initiates a self‑shutdown. Only the leader can reach this point because it is the only node that can commit a removal of itself.
async fn worker_as<C: Command, CE: CommandExecutor<C>, RC: RoleChange>(entry: Arc<LogEntry<C>>, prepare: Option<C::PR>, ce: &CE, curp: &RawCurp<C, RC>) -> bool {
let success = match entry.entry_data {
EntryData::ConfChange(ref conf_change) => {
let shutdown_self = conf_change.change_type() == ConfChangeType::Remove && conf_change.node_id == id;
if shutdown_self { curp.shutdown_trigger().self_shutdown(); }
true
}
_ => false,
};
ce.trigger(entry.inflight_id(), entry.index);
success
}Adding a New Node
When a node starts, it receives an InitialClusterState enum indicating whether it is part of a brand‑new cluster or joining an existing one. For a new cluster, each node can compute a globally unique ID locally. For joining an existing cluster, the node fetches the current cluster information via get_cluster_info_from_remote to inherit the correct IDs and avoid duplication.
let cluster_info = match *cluster_config.initial_cluster_state() {
InitialClusterState::New => init_cluster_info,
InitialClusterState::Existing => get_cluster_info_from_remote(&init_cluster_info, server_addr_str, &name, Duration::from_secs(3)).await?,
_ => unreachable!("xline only supports two initial cluster states: new, existing"),
};Node Removal and Pre‑Vote Check
Removing a node without shutting it down can cause the removed node to keep sending vote requests, wasting resources. Two naive approaches—shutting down before the configuration is committed or after—both have drawbacks. Xline adds a check in the pre‑vote phase: if a candidate is no longer present in the current configuration and no pending rollback would re‑add it, the node replies with a special shutdown_candidate flag, prompting the candidate to shut down.
pub(super) fn handle_pre_vote(&self, term: u64, candidate_id: ServerId, last_log_index: LogIndex, last_log_term: u64) -> Result<(u64, Vec<PoolEntry<C>>), Option<u64>> {
let contains_candidate = self.cluster().contains(candidate_id);
let remove_candidate_is_not_committed = log_r.fallback_contexts.iter().any(|(_, ctx)| {
match ctx.origin_entry.entry_data {
EntryData::ConfChange(ref cc) => cc.iter().any(|c| matches!(c.change_type(), ConfChangeType::Remove) && c.node_id == candidate_id),
_ => false,
}
});
if !contains_candidate && !remove_candidate_is_not_committed {
return Err(None); // indicate shutdown_candidate = true
}
// ...
}Summary of Strategies
Two main strategies for safe cluster membership changes are:
Joint Consensus – uses an intermediate configuration to avoid dual leaders.
Single‑step changes – simplifies implementation by changing one node at a time, at the cost of reduced flexibility.
Xline currently employs the single‑step approach and plans to add Joint Consensus support in the future.
Repository
For the full source code, see https://github.com/xline-kv/Xline
Linux Code Review Hub
A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
