RedKV: Xiaohongshu's Distributed NoSQL KV Storage System
RedKV, Xiaohongshu’s self‑developed distributed NoSQL key‑value store, combines a Redis‑compatible interface, RocksDB‑backed storage, and a shard‑managed multi‑cloud architecture that delivers three‑times higher write throughput, 1.5‑times read throughput, 40 % cost savings versus HBase, and sub‑5 ms P999 latency for petabyte‑scale workloads.
RedKV is a self‑developed distributed NoSQL key‑value storage system created by Xiaohongshu (Little Red Book). It supports both leaderless and leader‑based control architectures and was built to satisfy the company’s real‑time KV storage requirements. RedKV 1.0 relies on the Gossip protocol for node management, while RedKV 2.0 introduces a central Shard‑managed architecture that enables global multi‑cloud multi‑replica online elastic scaling, disaster recovery, and second‑level service switchover.
In terms of performance and cost, RedKV outperforms open‑source alternatives: aggregated write throughput is on average three times higher and read throughput 1.5 times higher than comparable open‑source products. Compared with HBase, the system achieves cost savings of nearly 40 %. RedKV offers partial compatibility with the Redis protocol, efficiently handling the string, hash and sorted‑set data types that constitute the majority of Xiaohongshu’s online storage workloads. This compatibility alleviates the cost pressures of early Redis cluster deployments and mitigates the performance and stability issues encountered with HBase. Additionally, RedKV can exchange data with the Hive data warehouse, providing a pathway for offline analytical workloads.
The system is organized into three layers. The access layer exposes a Redis‑compatible interface and supplies both community SDKs and a custom middleware layer. The proxy layer is a stateless CorvusPlus process capable of handling tens of millions of queries per second and scaling horizontally. The storage layer persists data using RocksDB, delivering high‑reliability read/write service. Figure 1 in the source illustrates the RedKV 1.0 architecture.
Several enhancements are implemented in the proxy layer. CorvusPlus applies a token‑bucket based multi‑dimensional rate limiter (connections, bandwidth, QPS) to prevent traffic spikes and avoid cascading failures. Online LZ4 compression of write‑incoming data reduces network bandwidth and storage consumption by more than 40 % without noticeable latency increase. The thread model is optimized so that a response can be returned as soon as preceding commands have completed, reducing head‑of‑line blocking. A backup‑read mechanism sends parallel requests to two replicas when observed latency exceeds the P95 threshold; if either reply succeeds the request is considered successful. This technique cuts the P999 latency from roughly 35 ms to about 4 ms. Large‑key detection metrics are exposed to identify keys that cause latency outliers, enabling mitigation through compression or other tactics.
Storage nodes run a multithreaded server: multiple I/O threads listen on a port and distribute incoming connections evenly. Each thread maintains its own request queue; a pool of worker threads processes the queues, ensuring that all operations for a given key or hash slot are handled by the same worker, which eliminates the need for explicit key‑level locking. The worker threads re‑encode the data and store it in a local RocksDB instance. The lock‑free threading model is depicted in Figure 9. Data is stored with a clear separation of meta‑key/meta‑value and data‑key/data‑value pairs, and a hash‑based slot partitioning scheme distributes continuous slot ranges to avoid hotspot formation.
RedKV provides a suite of data management features. One‑way replication is realized by extending the Redis replication protocol, allowing cluster expansion based on powers of two; after expansion a background task removes data that no longer belongs to a node according to the defined key sharding. For multi‑cloud active‑active deployments, a checkpoint‑based replication strategy enables bidirectional synchronization without requiring additional cleanup jobs. Bulk loading of offline data from Hive/S3 is performed via a custom Hive UDTF that encodes records into SST files, which are transferred across clouds using Hadoop distcp and ingested by a sidecar running on each RedKV node. Bulk export hashes the KV data into Hive‑compatible Parquet files: keys prefixed with a table name are scanned, converted into columnar format, and loaded into Hive. An example of the export format is shown below.
hmset {person}_1 name John quantity 20 price 200.23
hmset {person}_2 name Henry quantity 30 price 3000.45
For sharded tables the pattern includes a shard identifier, as illustrated here.
hmset {person:1}_1 name John quantity 20 price 200.23
hmset {person:1}_2 name Henry quantity 30 price 3000.45
...
hmset {person:16}_100000 name Tom quantity 43 price 234.56
Data backup and recovery rely on periodic snapshots shipped to a disaster‑recovery cluster; the proxy layer can switch traffic to a specific snapshot version within seconds, meeting minute‑level recovery SLAs for advertising data.
The multi‑cloud active‑active architecture places a replicator sidecar on the same host as the RedKV server, minimizing network overhead and allowing dual‑write, dual‑read across regions. This design provides regional disaster recovery without wasting resources and can be adapted to other systems such as Redis or graph databases.
In production, replacing HBase with RedKV for the zprofile middleware reduced total cost of ownership by 36 % and improved the P99 latency by approximately five times. The platform now sustains near‑100 million queries per second and petabyte‑scale storage while supporting core business functions such as user profiling, note storage, risk control, and recommendation.
References cited in the article include Pinterest’s Rocksdb replicator, the Facebook RocksDB project, and the HeavyKeeper algorithm for top‑k elephant flow detection.
Xiaohongshu Tech REDtech
Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.