Vectorized Storage Layer Refactoring in Apache Doris: Design, Implementation, and Performance Evaluation
This article explains the motivation, design, and implementation of vectorizing Apache Doris's storage layer using SIMD techniques, covering engine overview, vectorized programming concepts, storage architecture, index and predicate optimizations, delayed materialization, output improvements, and performance test results.
Introduction – The article introduces the vectorized transformation of Apache Doris's storage layer aimed at boosting query performance by leveraging vectorization features.
01 Apache Doris Engine Overview – Doris is positioned as an MPP OLAP database supporting both real‑time and batch data import (Spark, Flink, relational databases) and delivering sub‑second query latency, with low development cost compared to Flink or Spark.
02 Vectorized Programming Introduction – Vectorized programming (SIMD) processes columns in batches rather than row‑by‑row, enabling single‑instruction‑multiple‑data operations that are well‑suited for sum, min, max calculations in analytical workloads.
03 Apache Doris Storage Layer Overview – The storage layer reads data, deserializes it, and performs fine‑grained splitting, decoding, and merging (compaction). Queries may involve merging multiple files, handling fixed‑length and variable‑length columns, and applying predicate filters.
04 Storage Layer Vectorization Refactoring – Refactoring steps include: (1) identifying code paths suitable for SIMD (e.g., batch reads, comparisons); (2) rewriting those modules with SIMD intrinsics; (3) evaluating alternative optimizations for non‑vectorizable logic.
Index‑Based Optimizations – Doris supports prefix indexes and uses bitmap (e.g., RoaringBitmap) to prune rows, reducing I/O. Fixed‑length types (int, float, double) benefit directly from SIMD batch reads, while variable‑length types (strings) require dictionary encoding or conversion to numeric forms.
Predicate Push‑Down and Delayed Materialization – Predicate push‑down moves filter evaluation to the storage layer, potentially reducing data volume. Delayed materialization reads non‑predicate columns only after filtering, trading extra seeks for lower data transfer; its effectiveness depends on predicate selectivity and data type costs.
Output Optimizations – Not all models need merging; detail tables can stream directly to the execution layer, while key‑based and aggregation models require merging. Batch aggregation using SIMD further improves throughput.
Performance Evaluation – Early testing (SSB benchmark) shows significant storage‑layer speedups and modest SQL gains, though further tuning is needed for end‑to‑end performance.
Conclusion & Recommendations – Effective optimization requires deep code understanding, awareness of SIMD capabilities, and proper use of performance tools; community participation is encouraged for ongoing improvements.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.