FPGA-Accelerated X-Engine Storage Engine for High‑Performance OLTP
This article presents the design, implementation, and evaluation of X‑Engine, a next‑generation LSM‑Tree based storage engine that offloads compaction to FPGA, achieving up to 50% KV‑interface and 40% SQL‑interface performance gains for write‑intensive OLTP workloads.
X‑Engine is a new storage engine developed by Alibaba's Database Division as the foundation of the distributed database X‑DB, aiming for ten‑fold MySQL performance with one‑tenth storage cost by tightly integrating software and hardware techniques, including FPGA acceleration for compaction.
The OLTP workload of Alibaba generates billions of writes daily, with a high write‑to‑read ratio and hot‑data concentration, making efficient multi‑version data reclamation a critical challenge. X‑Engine adopts LSM‑Tree concepts, maintaining multiple memtables and flushing them to SSTables, while periodically compacting SSTables to merge key‑value pairs and reclaim space.
Compaction is CPU‑intensive and causes performance jitter. To mitigate this, X‑Engine offloads compaction to dedicated FPGA hardware. The FPGA design follows a hybrid architecture where the FPGA acts as a co‑processor connected via PCIe, allowing asynchronous task scheduling and high‑throughput data transfer via DMA.
The FPGA compaction unit (CU) consists of decoders, KV ring buffers, KV transfer, a compaction processing engine, encoders, and a controller that coordinates the pipeline stages, handling decode‑merge‑encode operations while balancing throughput differences among stages.
An asynchronous scheduling framework places compaction tasks into a Task Queue, assigns them to available CUs, and collects results in a Finished Queue, reducing CPU thread‑switch overhead and improving overall system stability.
Fault‑tolerance mechanisms include CRC checks for data integrity, duplicate CUs for bit‑flip detection, and input validation to prevent illegal KV lengths; failed tasks are re‑executed on the CPU.
Experimental evaluation on a 64‑core Intel Xeon server with a Xilinx VU9P FPGA shows that FPGA‑accelerated compaction consistently outperforms single‑threaded CPU compaction across all KV sizes, delivering up to ten‑fold throughput improvement. Benchmarks (DbBench, YCSB, TPC‑C, SysBench) demonstrate 40‑50% throughput gains for write‑only and mixed workloads, with reduced performance jitter compared to the CPU‑only engine.
In conclusion, FPGA‑accelerated X‑Engine provides significant performance improvements for write‑intensive workloads, achieves high availability through robust fault‑tolerance, and validates the viability of heterogeneous CPU‑FPGA designs for modern high‑throughput database systems.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.