LibraDB Execution Engine Architecture Evolution and Optimization
LibraDB, the column‑store replica of TDSQL MySQL, has evolved its execution engine from a simple scatter‑gather model to a vectorized SMP pipeline that integrates MPP parallelism, asynchronous I/O, SIMD‑accelerated aggregation and join operators, work‑stealing, and runtime filters, thereby fully exploiting CPU, memory, network and disk resources for both OLTP and analytical queries.
The article introduces the performance‑driven design of LibraDB, the column‑store replica of TDSQL MySQL, and explains how its execution engine has evolved to fully exploit system resources.
Background : TDSQL is an enterprise‑grade distributed SQL database known for high‑throughput OLTP. While its row‑store engine excels at online transactions, analytical workloads suffer due to the volcano execution model and row‑wise storage.
Basic Concepts :
SQL parsing, logical and physical planning, and column/row storage optimization.
Physical plan fragments (Fragment) are split into MPP tasks that run on multiple nodes.
Each MPP task is further divided into Pipelines, which consist of a chain of operators from Source to Sink.
Pipeline :
A Pipeline is a sequence of operators. Sources read data (local files, remote tables, or upstream MPP tasks), sinks output data (to remote exchange or downstream pipelines), and intermediate operators process chunks. Execution proceeds by pulling chunks from the previous operator and pushing results to the next.
Pipe : A Pipeline is instantiated into multiple Pipes based on the degree of parallelism (dop), each Pipe being a schedulable unit.
Execution Model Evolution :
v1.0 Scatter‑Gather: A simple two‑stage model where multiple LibraDB nodes perform pre‑aggregation (Scatter) and a single TDSQL node merges results (Gather). Example query: select l_orderkey from lineitem group by l_orderkey;
v2.0 MPP Parallel Model: Introduces Sender and Receiver operators. The TDSQL engine splits the query into MPP tasks, which are dispatched to LibraDB nodes for parallel execution. Example join query: select * from lineitem join orders on l_orderkey = o_orderkey;
v3.0 SMP Pipeline Model: Replaces the volcano model with a vectorized pipeline. Pipelines are built from the physical plan, split into Pipes, and scheduled on a work‑thread pool. This model improves CPU utilization, cache locality, and reduces context‑switch overhead.
Pipeline Construction : Based on the physical plan, LibraDB creates Pipelines (e.g., pre‑aggregation and final aggregation for TPC‑H Q1) and determines parallelism from the number of CPU cores.
Asynchronous Blocking Operations : To avoid CPU stalls caused by disk I/O or network back‑pressure, LibraDB separates CPU and I/O tasks using Block Cache queues for disk reads and network Buffer queues for sending data.
Data Skew Handling :
Work‑stealing allows idle pipes to steal tasks from overloaded pipes.
Local Exchange redistributes data across pipes when partition count is less than core count.
Operator Optimizations :
Aggregation (AGG) Operators : Parallel pre‑aggregation using per‑thread hash maps, parallel merge phase, SIMD vectorized hash computation, LLVM JIT code generation, specialized hash tables (≈60 variants), resize‑aware hash tables, and prefetching.
Join Operators : Join order optimization via CBO/Feedback, parallel hash‑build across threads, early data scanning, parallel probe, vectorized hash computation, specialized hash tables, runtime filters (Bloom filters) to prune input data, and adaptive hash table selection.
Conclusion : LibraDB’s execution engine combines multi‑node MPP parallelism with single‑node pipeline parallelism, async I/O, SIMD, runtime filters, and other techniques to maximize CPU, memory, network, and I/O utilization for both OLTP and analytical workloads.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.