Databases 21 min read

Design and Optimization of a Disk‑Based KV Store Compatible with Redis on TiKV

The article details a Redis‑compatible, disk‑based KV service built atop TiKV using a compute‑storage split (Tula), describes custom key encoding and expiration mechanisms, and explains four optimization stages that introduce slot‑based hashing and adaptive concurrency to dramatically cut garbage‑collection latency while preserving write performance.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Design and Optimization of a Disk‑Based KV Store Compatible with Redis on TiKV

This article, authored by the Vivo Internet Server Team, describes the design of a Redis‑compatible KV storage service built on top of TiKV, aiming to reduce storage costs while minimizing migration effort.

Background : The company lacked a unified KV service; many services used Redis clusters, which are memory‑intensive. To lower costs, a disk‑based KV service (referred to as "disk KV") was created by wrapping a compute layer (named Tula) over a TiKV cluster and exposing a Redis‑like interface.

System Architecture : The design follows a compute‑storage separation model. Tula simulates Redis slots and forwards client commands to TiKV. An architecture diagram (Fig. 1) illustrates the relationship between Tula, TiKV, and the external Redis client.

Data Encoding : Because TiKV only offers simple key‑value APIs, Redis data structures must be encoded. Two levels of keys are introduced:

metaKey – the user‑visible key, encoded with fields such as namespace , dbid , role (M for metaKey, D for dataKey), keyname , and suffix for complex types.

dataKey – stores the actual elements of complex structures (e.g., members of a SET, fields of a HASH) and is linked from the metaKey via a UUID.

The encoding respects TiKV’s lexicographic ordering, enabling efficient scans. Figures 2‑5 illustrate key encoding for generic keys, SET SADD commands, and the resulting dataKey layout.

Expiration & GC Design : To support Redis‑style expiration, an expireKey is introduced, placing the expiration timestamp at the front of the lexicographic key to allow fast scans. GC is handled via a gcKey that records UUIDs of keys whose data must be removed after expiration. Figures 7‑10 show the structures and processing flows for expiration and GC.

Problem Description : In production, windowed data caused excessive disk usage because GC lagged behind expiration. Analysis revealed that GC was single‑threaded and that the gcKey lacked a slot‑based hash, limiting concurrency.

Optimization Stages :

Stage 1 : Minor tweaks such as reducing sleep intervals and increasing batch size yielded limited improvement.

Stage 2 : Introduced slot‑like hashing on the first byte of the UUID, allowing 256 concurrent GC coroutines. This dramatically cut GC time.

Stage 3 : Added an adaptive mechanism that monitors TiKV load and dynamically adjusts GC aggressiveness (sleep time, batch size) to avoid impacting foreground writes.

Stage 4 : Extended concurrency across multiple Tula nodes by mapping each node’s Redis slot range to a hash range (using two bytes of the UUID), enabling multi‑node parallel GC.

Each stage is accompanied by diagrams (Figs 11‑16) and a brief discussion of results, showing progressive reductions in GC latency while preserving write performance.

Results : Benchmarking with 5 million SET writes (4 KB each) demonstrates that Stage 2 reduces GC time by more than half, and Stage 4 further improves it as node count grows (see Table 1).

Future Work : Plans include expanding monitoring, investigating client‑go bottlenecks, and refining GC parameters to achieve even higher throughput.

Garbage CollectionDatabase Optimizationdistributed-storageRedis compatibilityKV StoreTiKV
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.