Databases 19 min read

Why Distributed Databases Need a New Architecture: From Two‑Phase Commit to Raft

This article examines the pressures driving banks to redesign their core database systems, compares three distributed‑access approaches, explains the components of modern distributed databases, analyzes two‑phase and three‑phase commit issues, and evaluates consensus algorithms, CAP/BASE trade‑offs, GTM design, and point‑in‑time recovery.

dbaplus Community

Mar 8, 2020

Why Distributed Databases Need a New Architecture: From Two‑Phase Commit to Raft

Industry Background

Inclusive finance and fintech increase the load on financial information systems, exposing database bottlenecks and making distributed refactoring of databases essential.

Paths to Distributed Database Refactoring

Distributed access client – highest application invasiveness.

Distributed access middleware – moderate invasiveness.

Distributed database – lowest invasiveness but most complex to design and implement.

Overall Architecture of Distributed Databases

Typical architectures consist of three indispensable components:

Coordinator (access) node : parses SQL, generates distributed execution plans, forwards queries, aggregates results.

Data nodes : store data and perform computation.

Global Transaction Manager (GTM) : generates global transaction IDs and guarantees global consistency.

Two‑Phase Commit (2PC) Issues

In a PGXC‑style deployment the phases are:

Coordinator (CN) prepares.

All data nodes (DN) prepare.

CN commits.

All DNs commit.

If the coordinator crashes after issuing COMMIT, DNs that have already committed become blocked because they cannot determine the final outcome. The pgxc_clean process reconciles state across surviving nodes, but if a DN is the sole participant the transaction remains indeterminate.

Three‑phase commit adds a pre‑commit (can‑commit) phase and timeout mechanisms, allowing participants to elect a new coordinator when the original fails. This removes blocking at the cost of additional latency, which is mitigated on high‑speed networks (10 GbE, InfiniBand, RoCE).

CAP vs. BASE Trade‑offs

Distributed systems must guarantee Partition tolerance (P). Designers choose between Availability (A) and Consistency (C):

2PC sacrifices A to ensure C (strong consistency).

BASE (Basically Available, Soft state, Eventual consistency) sacrifices C to achieve higher A, using logs to reach eventual consistency.

Advantages of Raft

Raft simplifies consensus compared with Paxos by ensuring a single leader at any time, follower log replication, and clear safety guarantees. It is used in TiDB, etcd, Kubernetes and similar systems.

Coordinator Node (CN) Design

To provide SQL compatibility the CN can reuse existing MySQL or PostgreSQL parsers and server layers, inheriting native syntax support. The main design question is metadata placement:

If metadata resides on the CN, updates must be synchronized across CN instances; a CN failure can block DDL operations.

If metadata is external, a separate metadata service is required, but it removes the single point of failure.

Data Node (DN) Design

High availability : each DN should implement redundancy (master‑slave, streaming replication) and may use Paxos, Raft, or quorum protocols for strong consistency.

Online scaling : consistent hashing (or consistent hash with virtual nodes) reduces data movement when adding or removing nodes, balancing load while minimizing I/O.

Global Transaction Manager (GTM) Design

GTM provides a global snapshot and transaction‑ID service to ensure read‑side consistency. To avoid a single point of failure, GTM can store transaction IDs in a strongly consistent external store such as etcd, allowing any GTM instance to recover state without synchronous master‑slave replication.

Because every transaction contacts GTM, it becomes a performance bottleneck. Mitigation strategies include:

GTM‑Free : bypass GTM for weakly consistent reads.

GTM‑Lite : route only global transactions through GTM, allowing local transactions to proceed without GTM involvement.

Point‑In‑Time Recovery (PITR) for Distributed Databases

PITR relies on a base backup plus continuous WAL archiving. In a distributed setting each node backs up its own data, but the backup must be taken after all distributed transactions have reached a consistent commit point (e.g., after a barrier). Restoration replays WALs on all nodes up to the same consistency point, ensuring a globally consistent state.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CAP theorem distributed databases Raft two-phase commit point-in-time recovery GTM

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.