TXRocks Storage Engine: Architecture, Optimization, and Performance Analysis
TXRocks is a transactional storage engine based on RocksDB that offers significant space savings compared to InnoDB while maintaining comparable performance, with optimizations including sum operator pushdown and AEP-based secondary caching to address RocksDB's read performance limitations.
This comprehensive article introduces TXRocks, a transactional storage engine developed by Tencent's TXSQL team based on RocksDB. The document is structured into four main parts covering the engine's introduction, optimization techniques, performance testing, and summary.
The first section explains TXRocks' foundation on RocksDB, a high-performance persistent KV store originally developed by Facebook. TXRocks inherits RocksDB's LSM Tree storage structure, which reduces InnoDB's page fragmentation and allows compact storage formats. This results in space savings of up to 50% or more compared to InnoDB while maintaining similar performance levels, making it ideal for transaction-heavy workloads with large data volumes.
The article then delves into RocksDB's LSM Tree architecture, explaining how data is organized into in-memory MemTables and disk-based SST files across multiple levels (L0-L6). It details the write process, read operations, and compaction mechanisms that maintain optimal tree structure. The space-saving advantages of LSM Tree over B+Tree are highlighted, including reduced page fragmentation, prefix compression, and lower transaction overhead.
Performance characteristics are discussed, including RocksDB's lower write amplification and better SSD compatibility compared to InnoDB's in-place modification approach. However, the article acknowledges RocksDB's inherent read performance limitations due to its layered structure, where reads must traverse multiple levels and may require searching multiple SST files in L0.
The data dictionary section explains MyRocks' metadata management using column families, index IDs, and various data dictionary types stored in the '__system__' column family. It describes the record format for both primary key and non-primary key tables, as well as secondary key structures.
Optimization techniques are presented in the third section, including sum operator pushdown to reduce SQL layer overhead, AEP (Intel's non-volatile memory) as a secondary cache to improve read performance, and hotspot data identification algorithms. The AEP implementation uses AppDirect mode with level-based priority and LRU algorithms for the largest level, demonstrating significant performance improvements in mixed DRAM/AEP configurations.
Performance testing compares TXRocks with InnoDB across various configurations, showing that TXRocks generally offers better write performance but slightly lower read performance. The article notes TXRocks' sensitivity to memory allocators, recommending jemalloc over glibc to avoid memory fragmentation issues. Compression ratio tests demonstrate TXRocks' superior space efficiency, with space usage ratios of approximately 1:1.41:2.78 compared to compressed and uncompressed InnoDB respectively.
The final section summarizes TXRocks' suitability for cost-sensitive, read-heavy workloads with transaction requirements, such as historical data storage. The engine is already deployed in production environments like WeChat red packets and is being offered to external customers through Tencent's cloud services.
Tencent Database Technology
Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.