Backend Development 5 min read

Introducing RaftKeeper: A High‑Performance Raft‑Based Distributed Coordination Service

RaftKeeper is an open‑source, C++‑implemented Raft‑based distributed consensus service that offers double‑the‑throughput, sub‑second latency, five‑nines availability, and full ZooKeeper compatibility, targeting high‑performance OLAP workloads and large‑scale backend scenarios.

DataFunTalk
DataFunTalk
DataFunTalk
Introducing RaftKeeper: A High‑Performance Raft‑Based Distributed Coordination Service

Background : In large‑scale distributed systems with hundreds of servers, failures and network jitter can cause severe crashes; Yahoo open‑sourced ZooKeeper for coordination, which became a core component of Hadoop, HBase, and ClickHouse. To overcome throughput and latency limits of ZooKeeper in ClickHouse, JD Retail's OLAP team built RaftKeeper, a Raft‑protocol‑based consensus service written in C++, now fully open‑sourced for the community.

Technical Architecture : RaftKeeper implements the Raft protocol to guarantee sequential consistency and strict read‑write ordering within a session. Data resides in memory with snapshot + operation‑log persistence. The execution framework uses pipelining and batch execution to greatly increase throughput.

Figure 1: RaftKeeper Architecture

Core Advantages :

1. High Performance : RaftKeeper delivers more than twice the throughput and capacity of traditional coordination services, halves latency, and reduces resource consumption, as shown in its benchmark.

Figure 2: RaftKeeper Performance Test

2. High Availability : Provides five‑nines availability, eliminates single points of failure, guarantees no data loss after write, and supports cross‑datacenter coordination.

3. Full ZooKeeper Compatibility : Compatible with ZooKeeper clients, visualization, and monitoring tools; includes data conversion utilities for seamless migration.

Optimization Path : RaftKeeper serializes log processing to maintain order, employing batch and pipeline execution as suggested by the Raft paper. It uses a ring buffer for hot log data and a segmented hash table for the state machine to avoid pause‑inducing rehashing, along with extensive I/O and lock‑granularity optimizations.

The project benefits from eBay’s NuRaft framework and ClickHouse’s high‑performance libraries.

Application Scenarios : Deployed at JD Retail across many large‑scale use cases, proving reliability during major sales events. In ClickHouse, it removes metadata bottlenecks and accelerates massive data imports; in HBase, it supports 300 k concurrent clients with lower latency. It also serves cluster management, node coordination, configuration centers, and naming services.

Project Address : Contact: [email protected]. Repository: https://github.com/JDRaftKeeper/RaftKeeper . Users are invited to try the system and provide feedback.

BackendDistributed Systemsopen sourcehigh performanceRaftConsensus
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.