Databases 5 min read

Unlocking Bitcask: How Log-Structured Key-Value Stores Achieve High Performance

This article explains the Bitcask key‑value storage model, covering its log‑structured file design, in‑memory hash index, handling of deletions and updates, periodic merge operations, and the use of hint files to speed up hash index reconstruction.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
Unlocking Bitcask: How Log-Structured Key-Value Stores Achieve High Performance

What is a log‑structured data file?

Bitcask stores data in physical files using an append‑only log style, ensuring sequential writes and high write performance. When a file reaches a configured size, a new active file is created, resulting in N old files plus one active file.

Each record in a data file follows a specific structure, as illustrated in the diagram below.

How to read data efficiently?

Bitcask uses an in‑memory hash index that records every key and its location in the data files. A get operation looks up the key in this hash table to obtain the file offset, then reads the value directly from the data file.

How are deletions and updates handled?

Bitcask does not modify files in place. Deleting a key creates a new record with the same key and a special delete marker; the old record remains in the file but is ignored by the hash index. Updating a key works similarly: a new record with the updated value is appended, and the hash index is updated to point to the newest entry.

How does Bitcask deal with obsolete data?

Over time, many stale or deleted records accumulate, wasting space. Bitcask periodically performs a **merge** operation that scans all old data files, discards records marked as deleted, and keeps only the most recent version of each key, writing the result to new data files.

How to speed up hash index reconstruction?

Since the in‑memory hash index is not persisted, it must be rebuilt on startup by scanning all data files, which can be slow. Bitcask generates a **hint file** during merge operations; this file stores only the key and the location of its value, allowing the hash index to be rebuilt quickly without scanning the full data files.

Overall Bitcask architecture

The system consists of three main components:

Hash index file : an in‑memory hash table mapping keys to value locations for fast lookups.

Data file : an append‑only log where records are written sequentially; when a file reaches a size limit, a new active file is created, forming one active file plus N old files.

Hint file : a compact on‑disk index generated during merge operations, used to accelerate hash index reconstruction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

key-value storeHash IndexBitcasklog-structured storage
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.