Databases 13 min read

How We Replaced Expensive Redis Clusters with KVROCKS on SSDs

Facing a ten‑fold cost increase for public‑cloud Redis, Ctrip engineers evaluated Redis‑on‑SSD alternatives, chose KVROCKS, performed extensive protocol‑compatible modifications, benchmarked performance on standard and Optane SSDs, and demonstrated substantial cost savings while preserving Redis semantics.

dbaplus Community
dbaplus Community
dbaplus Community
How We Replaced Expensive Redis Clusters with KVROCKS on SSDs

Background

In 2019, Ctrip’s G2 strategy required large‑capacity Redis clusters to serve overseas customers. Public‑cloud Redis cost was roughly ten times higher than private‑cloud GB pricing, prompting a search for a cheaper SSD‑based Redis replacement.

Research and Selection

The team examined several Redis alternatives: Redis On Flash (commercial), Pika, and KVROCKS (both open source). Pika suffered from a binary client protocol and protobuf‑based replication, heavy Rsync‑style multi‑process replication, and messy code. KVROCKS offered closer semantic and replication compatibility with native Redis, though it was newly open‑sourced and had limited adoption.

Because KVROCKS’ framework and codebase were easier to grasp, the team decided to extend KVROCKS to support Redis’s SYNC/PSYNC replication protocol, enabling compatibility with Ctrip’s xpipe replication tool.

Redis replication overview
Redis replication overview

Secondary Development

Key steps to turn a KVROCKS instance into a Redis slave:

Simulate the slave state machine after SLAVEOF (see Fig. 3).

During full sync (REPL_STATE_TRANSFER), receive and parse the master’s RDB file.

Insert parsed keys into RocksDB using the appropriate data type commands.

Enter the CommandPropagate phase, continuously receiving incremental commands from the master and ACKing the replication offset each second.

The KVROCKS replication module already supported KVROCKS‑to‑KVROCKS sync; the team added a new state machine for Redis‑to‑KVROCKS sync, renamed the original KVROCKS replication to kslaveof, and introduced fields such as repl_offset, repl_id, and slave_mode to distinguish Redis vs. KVROCKS slaves.

KVROCKS replication state machine
KVROCKS replication state machine

Performance Evaluation

Extensive testing (≈100 versions, 2 months production) yielded the following insights:

KVROCKS on SSD shows comparable latency and QPS to in‑memory Redis when RocksDB is configured with 4 client threads, 1 replication thread, 128 MiB block cache, 64 MiB write buffer, and 2 GiB WAL size.

Increasing client‑handling threads from 1 to 4 improves response time, but scaling beyond 4 offers no benefit (see Fig. 7).

Optane SSD (Intel Optane) outperforms regular SSD: SET latency drops to ~⅓, and QPS triples (Figs. 8‑11).

Four‑thread KVROCKS on Optane even surpasses Redis in some workloads (Fig. 13).

KVROCKS vs Redis performance
KVROCKS vs Redis performance

Cost Savings

By replacing Redis with KVROCKS, CPU usage rose to ~100 % while memory dropped from 6 GiB to ~1 GiB per instance, leading to an estimated 60‑80 % reduction in overall infrastructure cost (≈63 % for the tested cluster).

Cost comparison
Cost comparison

Pitfalls Encountered

Compilation requires --with-jemalloc-prefix=je_ for container compatibility (see issue #54).

Binary incompatibility when building on newer CPUs and running on older machines, likely due to Snappy compression instructions.

RocksDB fails on certain virtual file systems; switching to XFS resolves the issue (issue #56).

Type mismatch for SETBIT commands because Redis treats the value as a string while KVROCKS treats it as a bitmap; a temporary workaround prefixes such keys with bit_.

Concurrent access to libevent’s evbuffer caused crashes; adding a lock fixed it (see Fig. 18).

A deadlock in KVROCKS pub/sub was quickly fixed upstream (issue #68).

evbuffer lock issue
evbuffer lock issue

Future Outlook

More than 50 % of public‑cloud Redis instances have already been migrated to KVROCKS. The roadmap includes replacing all replaceable Redis workloads, supporting SLAVEOF to KVROCKS, and eventually open‑sourcing the enhancements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

redisCost OptimizationSSDRocksDBKVROCKS
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.