How CDSBen Bridges Database Transactions and Storage I/O for Accurate Cloud‑Native Benchmarking
This article introduces the CDSBen model, a machine‑learning‑based benchmark that translates real database transaction patterns into storage‑layer I/O workloads, enabling precise, isolated performance testing of the cloud‑native veDB storage system.
Background
With explosive business growth and mature cloud‑native technology, many cloud‑native distributed databases have emerged. Some OLTP‑oriented databases emphasize elasticity from a compute‑storage separation architecture.
Benchmarking the underlying storage of such databases with real end‑to‑end workloads faces two main difficulties:
There is no de‑facto “standard” benchmark model for database‑specific storage systems, unlike fio for general storage.
Database‑specific storage differs significantly from classic storage; treating it as classic storage in a benchmark ignores database characteristics, leading to a “gap” between end‑to‑end workload and storage behavior.
veDB Overview
veDB is Volcano Engine’s cloud‑native distributed database built on a compute‑storage separation architecture for OLTP scenarios. Its goals are high elasticity, cost‑effectiveness, ease of use, and high reliability/availability.
The system architecture consists of three layers:
Access layer: authentication, flow control, read/write routing.
Compute layer: fully compatible with MySQL, PostgreSQL, supporting DML, DDL, transactions.
Storage layer: a dedicated distributed storage system (veDB DBStore) that can be plugged into different database engines.
Benchmark Challenges
Traditional end‑to‑end benchmarks (TPC, sysbench) execute SQL statements and cannot isolate the storage layer because the storage cannot process SQL directly. Modified YCSB‑like tools can benchmark the storage but generate I/O patterns unrelated to real database transactions.
CDSBen Model
To address these issues, the Volcano Engine database team proposed the CDSBen model, which uses machine‑learning methods to predict the storage‑layer I/O pattern from real database transaction patterns.
CDSBen consists of two learning models:
An IOPS‑sequence prediction model based on a recurrent neural network.
A joint distribution prediction model based on a random forest, used to predict the target address (PageStore segment ID) and write size of read/write requests.
Workflow:
Select one or more real business scenarios and extract workload features from veDB’s compute‑layer and storage‑layer logs to train the models.
Input compute‑layer workload features; CDSBen predicts the corresponding storage‑layer features.
Generate concrete read/write requests with a modified YCSB and run them directly on veDB DBStore for benchmarking.
Feature Extraction and Model Training
Compute‑layer workload is represented by TPS of each transaction type (e.g., SELECT, INSERT, UPDATE, DELETE). Storage‑layer workload consists of read/write requests to PageStore segments; read requests are treated as writes with zero data size. A two‑dimensional array records the distribution of requests over target addresses and write sizes.
After feature extraction, the two models are trained. Given a desired compute‑layer workload vector, CDSBen predicts the storage‑layer IOPS sequence and joint distribution, which are then used by YCSB to generate realistic I/O for performance testing.
Advantages
Accuracy – validated by experimental results (see Section 4.2).
Flexibility and ease of use – CDSBen can run directly on the storage layer like YCSB, without deploying a compute layer.
It allows “what‑if” analysis by changing TPS or transaction mix.
Experimental Results
Compared with YCSB, CDSBen‑generated requests produce performance measurements much closer to real online traffic. In a test with a production workload (named SYNC), the predicted IOPS curve matches the actual IOPS curve with high overlap; the average IOPS values are 999 (predicted) versus 1046 (actual).
Conclusion
Without CDSBen, benchmarking veDB DBStore under realistic workloads would require the compute layer, making isolated storage testing impossible. CDSBen bridges the gap between transaction patterns and storage I/O patterns, enabling precise storage tuning and stable performance for each veDB release.
ByteDance Cloud Native
Sharing ByteDance's cloud-native technologies, technical practices, and developer events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.