Fundamentals 15 min read

How KVSSD Integrates LSM Trees and Flash Translation to Slash Write Amplification

This article reviews the KVSSD paper presented at DATE 2018, explaining how close integration of LSM trees with the flash translation layer reduces write amplification, outlines the design optimizations such as K2P mapping, remapping compaction, hot‑cold separation, and discusses performance results and industry progress.

Qingyun Technology Community

Sep 14, 2021

How KVSSD Integrates LSM Trees and Flash Translation to Slash Write Amplification

Background

The paper "KVSSD: Close integration of LSM trees and flash translation layer for write‑efficient KV store" was presented at the 2018 Design, Automation & Test in Europe (DATE) conference by Sung‑Ming Wu, Kai‑Hsiang Lin, Li‑Pin Chang and others.

It proposes providing a KV interface directly on SSDs by tightly coupling LSM trees with the flash translation layer (FTL) to avoid the write amplification caused by multiple software layers (LSM tree, host file system, FTL).

Write Amplification in KV Storage Systems

A typical KV storage stack consists of an LSM tree (with mutable memtables and immutable SSTables) on top of a file system, which in turn sits on an SSD managed by an FTL. Compaction, file system metadata, and the read‑modify‑write nature of SSDs all contribute to write amplification.

LSM tree compaction

File system overhead

FTL read‑modify‑write

FTL garbage collection

The paper focuses on the last three factors, multiplying them to estimate overall write amplification.

How to Alleviate Write Amplification?

Various algorithmic and software approaches have been explored, such as LSM‑trie, PebblesDB, WiscKey, Badger, and TiKV’s Titan engine, which separate keys and values or adjust compaction strategies.

Filesystem‑level optimizations include KV‑aware filesystems, higher‑ratio transparent compression, and workload‑specific tuning.

Hardware‑level solutions involve either removing abstractions (e.g., Open‑Channel SSDs) or adding new ones (e.g., KV interfaces in firmware, compute‑offload SSDs).

KVSSD Design

KVSSD follows the “add abstraction” path by implementing a native KV interface in firmware. It uses a flash‑native LSM tree called nLSM (NAND‑flash‑LSM). nLSM replaces the L2P mapping of the FTL with a K2P (Key‑to‑Physical) mapping based on a key‑range tree, allocating a 4 MiB flash block per SSTable for alignment and contiguous storage.

K2P Mapping

K2P stores non‑overlapping key‑range to SSTable mappings in a key‑range tree. A metadata page points to a KV page, which contains sorted KV pairs. During compaction, old SSTables are merged into new ones, and the obsolete blocks can be erased without additional data copying.

Remapping Compaction

Instead of rewriting all KV pages during compaction, remapping compaction updates only a few metadata pages to point to existing KV pages, reducing the write cost dramatically.

Hot‑Cold Separation

By grouping data with similar lifetimes into the same flash blocks, garbage collection can target cold data separately, lowering write amplification.

KVSSD Performance Analysis

The evaluation uses a 15 GiB SSD (5 % reserved, 32 KiB page size, 4 MiB block size) and replays LevelDB I/O traces on an SSD simulator. Compared configurations include LSM, dLSM (delayed compaction), lLSM (lightweight compaction), nLSM, rLSM‑ (nLSM with remapping compaction), and rLSM (nLSM with both remapping compaction and hot‑cold separation).

Results show that rLSM reduces write amplification to 12 % of the baseline, improves throughput by 4.47×, while incurring an 11 % increase in read amplification.

Industrial Progress

Samsung adopted the KVSSD concept in a 2019 SYSTOR paper, releasing an open‑source KVSSD API and driver (https://github.com/OpenMPDK/KVSSD). Their prototype demonstrates linear TPS scaling with device count and minimal CPU bottlenecks.

The NVMe 2.0 specification, released in June 2021, standardizes KVSSD commands as the NVMe‑KV command set, paving the way for commercial products.

Q&A

Industry: ZNS vs. KVSSD?

ZNS is easier to implement and cheaper, so it currently attracts more attention.

Does KVSSD’s hot‑cold separation cause uneven block wear?

Yes, the paper does not detail wear‑leveling; implementations must address it.

Does KVSSD consume host memory or CPU?

No, KVSSD includes its own memory and processing chip, offloading the workload from the host.

LSM‑Tree NVMe flash storage Write Amplification KVSSD

Written by

Qingyun Technology Community

Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.