Databases 19 min read

How I Built a High‑Performance Java KV Store for a Competition – Architecture and Optimizations

This article details the design, implementation, and performance tuning of a Java‑based key‑value storage engine built for a competition, covering the competition requirements, interface definitions, data layout, recovery logic, multi‑threaded read/write strategies, Direct I/O techniques, JVM tuning, and practical lessons learned.

Programmer DD
Programmer DD
Programmer DD
How I Built a High‑Performance Java KV Store for a Competition – Architecture and Optimizations

Introduction

I participated in a KV‑store competition using Java, ranked 21st, and in this article I share the overall architecture and Java‑specific best practices for building a high‑performance storage engine.

Competition Overview

The contest consists of a preliminary and a final round. Participants must implement a simplified, efficient KV store supporting write(byte[] key, byte[] value), read(byte[] key), and in the final round an additional range(byte[] lower, byte[] upper, AbstractVisitor visitor) interface.

Evaluation Criteria

Two phases are evaluated:

Correctness: the engine must survive repeated kill -9 crashes and recover without data loss.

Performance: random writes, random reads, and sequential (range) reads are measured with 64 concurrent threads.

Key and Value Design

Keys are fixed at 8 bytes, represented as long. Values are 4 KB, which aligns well with disk I/O and allows the logical offset to be stored as an int (offset = physicalOffset / 4096), halving memory usage.

Handling kill‑9

Since the test forces process termination, we use the OS page cache as a write buffer and avoid asynchronous I/O. For durability we mmap a region and flush it in 4‑value batches; this approach is faster than writing 4 KB blocks directly.

Hidden Evaluation Details

After each phase the test script clears the page cache (via sysctl -w vm.drop_caches=1/2/3) and counts that time toward the final score, making excessive page‑cache usage a penalty and highlighting Direct I/O as a “silver bullet”.

Key Distribution and Partitioning

The 64‑million keys are uniformly distributed. By hashing the high 10 bits of the key we obtain 1024 partitions, each stored in its own file. This enables converting random writes into sequential writes within a partition.

public class HighTenPartitioner implements Partitionable {
    @Override
    public int getPartition(byte[] key) {
        return ((key[0] & 0xff) << 2) | ((key[1] & 0xff) >> 6));
    }
}

Architecture Overview

Three perspectives:

Global view : 1024 partitions, each containing an index file and a data file.

Partition view : Within a partition, writes are appended sequentially; the index maps keys to logical offsets.

Memory view : Two in‑memory arrays hold key[1024][625000] and offset[1024][625000].

Random Write Flow

We allocate a write buffer per partition using mmap, batch four values before flushing, and store the index via mmap as well. No in‑memory index is maintained during the write phase; the recover phase reconstructs offsets from key order.

ByteBuffer writeBuffer = ByteBuffer.allocateDirect(4 * 4096);
// fill buffer, then force to disk

Recover Flow

On engine startup we open the data files, load keys and offsets into the two arrays, and sort each partition using 64 threads. Sorting is performed on separate key and offset arrays to avoid struct overhead and improve CPU cache utilization.

public class KeyOffset {
    long key;
    int offset;
}

Random Read Flow

For a given key we locate its partition, binary‑search the sorted key array to obtain the offset, then read the value directly from the data file.

Sequential (Range) Read Flow

Range is the performance bottleneck. We use a producer‑consumer model: four fetch threads load entire partitions into off‑heap buffers (≈256 MB each), while 64 visit threads concurrently consume the buffers and invoke the visitor callback. This design maximizes disk‑to‑memory throughput and keeps CPU busy.

public void acquire() throws InterruptedException {
    while (!permit) {
        Thread.sleep(0, 1);
    }
    permit = false;
}

Direct I/O in Java

Java lacks native Direct I/O, so we wrap the required syscalls via JNA, following the jaydio approach. Two crucial points: allocate aligned memory (via posix_memalign) and ensure writes are page‑size multiples using pread with O_DIRECT.

Direct Memory vs Heap Memory

Off‑heap buffers avoid extra copies and reduce GC pressure, especially for the large 256 MB range buffers.

JVM Tuning

-server -Xms2560m -Xmx2560m -XX:MaxDirectMemorySize=1024m -XX:NewRatio=4 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:-UseBiasedLocking

Increasing the old generation (NewRatio=4) cuts CMS GC cycles from 126 to 5, significantly improving latency.

Object Pooling

Instead of heavyweight pools we use ThreadLocal caches for frequently allocated objects (e.g., key/value wrappers), eliminating per‑iteration allocations.

Reducing Thread Switching

A custom semaphore with busy‑wait and micro‑sleep reduces contention between the 4 I/O threads and the 64 visit threads, yielding a ~6 s performance gain.

Core Binding

We pin the four I/O threads to dedicated CPU cores using the OpenHFT Affinity library, preventing scheduler interference and adding 1–2 s speedup.

FileChannel, MMAP, and Direct I/O Discussion

While FileChannel leverages the page cache, Direct I/O can be combined with limited page‑cache usage for consistency. The insights from this competition suggest that a hybrid approach can benefit production storage engines.

Conclusion

Using Java, I achieved a final score only ~10 s slower than C++ competitors across 64 million random writes, reads, and sequential reads, demonstrating that with careful architecture, Direct I/O, and JVM tuning, Java can approach native performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Storage EngineKV Storedirect-io
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.