How Lest Redefines Persistent Caching for Petabyte‑Scale KV Stores
This article presents Lest, a persistent KV‑store cache designed to overcome the limitations of traditional in‑memory caches by combining disk‑based persistence, lock‑free multithreading, custom protocols, and advanced load‑balancing, and it evaluates its performance on petabyte‑scale workloads.
Background and Motivation
Typical cache systems such as Memcached keep data only in memory and lack persistence. Even Redis, while adding persistence, still raises concerns: performance drops when persistence is enabled, recovery after a crash can be problematic, memory limits data size, and existing designs often ignore persistence, making them awkward to use.
Introducing Lest
Lest is a persistent cache (more accurately a storage system) built as a KV store supporting String, List, and Map. It is already deployed in production, handling 1‑2 PB of data and serving billions of requests per day.
Key Design Goals
Seamless cache synchronization with zero‑downtime scaling.
Fast recovery from master failures; standby nodes can take over quickly.
Support for complex data types (List, Map) on disk without sacrificing query performance.
Architecture Overview
The system consists of a Tracker (cache proxy layer), storage nodes, and operation nodes that partition data into segments (e.g., 256 or 128 segments). Communication and storage use a custom protocol.
Load Balancing and Hashing
Keys are custom‑defined, requiring a hash‑based distribution. Lest adopts a two‑level hash: the first hash selects a segment, the second selects a machine. To minimize data movement during scaling, nodes are added in powers of two (2 × N), reducing the migration ratio to about 50‑66%.
Data Model and Storage
Each stored item includes metadata (length, version, type) followed by the actual payload. The implementation is in C, delivering roughly ten‑fold performance gains over Java‑based solutions. An ever‑increasing monotonic ID generator, derived from a time‑vector algorithm, ensures version ordering.
Lists and Maps share a similar layout to Strings; the details are illustrated in the accompanying diagrams.
Synchronization Mechanism
Within the same group, storage nodes synchronize via the Tracker, using a high‑speed IP protocol. Changes are recorded in binlogs, which are replayed for consistency. The system operates without traditional slaves; all nodes are masters, simplifying failover.
Performance Evaluation
Benchmarks show that Lest on a 10 GbE server with SSDs outperforms a 1 GbE server with SATA disks. Throughput reaches about 60 % of the theoretical maximum due to limited client count (10 clients). Latency remains stable around 1‑2 ms for most operations, with occasional outliers up to a few hundred milliseconds.
When the data set exceeds 10 K entries, performance gradually declines, similar to Redis, indicating that both systems favor smaller datasets.
Advantages and Recommendations
Lest consumes relatively little memory, relies on disk (preferably SSD) for bulk storage, and can replace dozens of Redis instances with just a few nodes. It offers transparent integration for developers and benefits from lower cost per GB compared to memory‑only caches.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
