Databases 18 min read

How JIMKV Unifies Cache and Storage to Power High‑Performance Distributed Databases

The article details JD Retail's JIMKV distributed database, explaining its unified cache‑storage architecture, fault‑detection and elastic‑scaling mechanisms, hot‑cold data tiering, read/write amplification mitigation, real‑world product‑detail use case, and future plans for intelligent operations and OLAP support.

dbaplus Community

Aug 5, 2020

How JIMKV Unifies Cache and Storage to Power High‑Performance Distributed Databases

Background and Motivation

With the exponential growth of mobile internet traffic, enterprises face increasing demands for low‑latency, high‑throughput data services. Traditional storage—whether HDD or SSD—cannot alone meet the performance and scalability needs, prompting the adoption of distributed caching to accelerate data access and reduce database hotspots.

However, introducing caching adds architectural complexity, consistency challenges, and operational overhead. To address these issues, JD Retail rebuilt its KV store, JIMDB, into a new distributed database called JIMKV , which integrates caching and storage into a single unified architecture.

Early Architecture and Core Features

1. Automatic Fault Detection and Recovery

The initial design considered Zookeeper for health checks, but deployment difficulties and performance concerns led to a custom probe solution. Multiple probe instances are deployed across different racks; if any probe reports a node as alive, the node is considered healthy. When a majority of probes deem a node dead, a recovery program triggers master‑slave failover and updates the cluster topology for all clients.

2. Automatic Elastic Scaling

To handle sudden traffic spikes, the system monitors metrics such as OPS, memory usage, and network traffic. When thresholds are exceeded, the platform automatically expands the cluster; when metrics fall below shrink thresholds, it contracts the cluster. Scaling involves migrating slots between instances, and a pre‑allocation mechanism prevents resource over‑commit during concurrent expansions.

Challenges from Large‑Scale Promotions and Industry Trends

During major sales events (e.g., JD 618, Double‑11), memory costs surged and pure‑in‑memory storage became unsustainable. The single‑threaded, multi‑process design of JIMDB also hit CPU bottlenecks. Inspired by Google Spanner, TiDB, and CockroachDB, the team recognized the need for a NewSQL‑style, scalable solution that supports both OLTP and OLAP (HTAP).

Architecture Design and Application Scenarios

1. Overall Architecture

JIMKV consists of three layers:

Master : Manages metadata, performs cluster scheduling and load balancing, and generates globally unique, monotonically increasing transaction IDs.

DS Cluster : Stores data as ranges; each range belongs to a Raft group for replication and fault tolerance. The Master balances ranges across DS nodes.

Proxy : Stateless compute layer that accepts SQL and Redis protocols, resolves data locations via the Master, forwards requests to DS nodes, and returns results. It can be horizontally scaled behind load balancers.

2. Application Scenarios

JIMKV is suited for:

Data warehouses handling massive real‑time reads/writes.

Replacing MySQL warehouses where sharding or middleware is too invasive.

Cache‑accelerated data warehouses, offering Redis‑like latency with higher throughput.

Financial‑grade OLTP workloads requiring strong consistency and high availability.

Case Study: JD Product‑Detail Database

In the product‑detail service, traffic is huge and data size per key is large, creating hotspot risks. JIMKV employs hot‑cold tiered storage: hot data resides in memory (masstree engine) for low latency, while cold data is persisted on disk (RocksDB or WiscKeyDB) to reduce cost.

Hot‑cold decisions are driven by configurable thresholds maxmemory and maxdisksize. When memory usage exceeds maxmemory, values are evicted to the disk engine asynchronously; when memory usage falls below 70% of maxmemory, cold values are pulled back into memory on demand.

Mitigating Read/Write Amplification

Traditional LSM‑tree based stores (LevelDB, RocksDB) suffer from read, space, and write amplification. JIMKV adopts WiscKeyDB techniques to separate keys (in LSM‑tree) from values (in a log file), use parallel SSD reads for values, employ a crash‑consistent garbage collector, and eliminate the write‑ahead log for small writes, thereby reducing amplification and extending SSD lifespan.

Future Roadmap

Intelligent Operations : Explore machine‑learning‑driven self‑healing, auto‑tuning, and performance optimization.

OLAP Support : Extend MySQL compatibility to cover aggregation, joins, and analytical queries.

New Hardware Integration : Leverage NVMe, persistent memory, kernel‑bypass networking, and accelerators (DPDK, SPDK, RDMA, FPGA) to further lower I/O latency and improve throughput.

Through the combined use of masstree (memory) and WiscKeyDB (disk) engines, JIMKV achieves up to 75% storage cost reduction while meeting the stringent performance requirements of JD’s high‑traffic services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed database hot‑cold tiering Elastic Scaling KV store read amplification cache storage integration

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.