Flex: A Stream‑Batch Integrated Vectorized Engine for Flink
This article introduces Flex, a Flink‑compatible stream‑batch vectorized engine built on Velox and Gluten, explains the SIMD‑based execution model, details native operator optimizations, fallback mechanisms, correctness and usability improvements, and presents performance results and future development plans.
Flex is a self‑developed vectorized engine by Ant Group that combines stream and batch processing, leveraging SIMD instructions, columnar data layout, and the Velox execution framework. It derives its name from the prefixes of Fink and Velox, emphasizing flexibility and plug‑in capability.
The article first explains the background of vectorized computing, describing the three‑step load‑compute‑store cycle of traditional SISD execution and how wider registers enable SIMD parallelism. It then outlines the drawbacks of row‑oriented processing, such as low CPU cache hit rates, handling of variable‑length fields, and virtual‑function overhead.
To unlock SIMD benefits, Flex reorganizes data by columns, improves cache locality, applies the same computation across rows of a column, and reduces function‑call overhead through template‑generated code. The engine integrates with existing native engines like ClickHouse and Velox via a Gluten‑style plan, avoiding a full rewrite.
Flex introduces several native‑layer optimizations:
NativeCalc operator that isolates only the fields used by expressions, reducing data‑conversion overhead from 11% to 6.68%.
Support for native source/sink that reads/writes Arrow‑format data directly, eliminating row‑to‑column conversion.
Rule‑based plan rewrites that split and merge Calc operators based on the presence of SIMD functions in projections or conditions.
Fine‑grained fallback mechanisms are provided: only expressions containing SIMD functions are translated to NativeCalc, function‑signature blacklists can force fallback, and timestamp/decimal types are handled conservatively.
Correctness is ensured through automated function‑level and job‑level comparison frameworks that run Flink’s existing unit tests under vectorized settings and daily end‑to‑end job comparisons, uncovering mismatches in both Flink and Velox implementations.
Usability improvements include automatic detection of unsupported functions, performance testing via JMH, a vectorization dashboard for job‑level metrics, and native operator support for various DAG components.
Stability measures involve monitoring and alerting for vectorized job configurations and automatic injection of native memory resources during planning.
Production results show that 13% of jobs achieve a 1× TPS increase, 37% see a 40% boost, and the average improvement across all vectorized jobs is 75%, with the best case reaching a 14× speedup.
Future plans cover a new data conversion layer that directly maps RowData to Velox RowVector, support for more operators (including stateful ones), full SQL type coverage, and integration with Paimon for native Parquet/ORC readers.
References: https://github.com/apache/incubator-gluten, https://github.com/facebookincubator/velox, https://tech.meituan.com/2024/06/23/spark-gluten-velox.html
AntData
Ant Data leverages Ant Group's leading technological innovation in big data, databases, and multimedia, with years of industry practice. Through long-term technology planning and continuous innovation, we strive to build world-class data technology and products.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.