Velox: An Open‑Source Unified Execution Engine for Data Systems
Velox is Meta's open‑source unified execution engine that consolidates common data‑intensive components, integrates with engines like Presto, Spark, and TorchArrow, and delivers up to ten‑fold speedups on CPU‑bound queries while simplifying development and fostering a reusable, community‑driven ecosystem.
Meta introduced Velox, an open‑source unified execution engine designed to accelerate data‑management systems and simplify their development by addressing the fragmentation of many specialized compute engines.
Velox provides core building blocks shared by data engines—type system, columnar vector layout compatible with Apache Arrow, a vectorized expression evaluator, extensible functions, common SQL operators, and I/O interfaces (connectors, storage adapters, serializers) plus resource‑management primitives—allowing diverse engines to reuse optimized implementations.
Key integrations include Presto (Prestissimo), where C++ Velox workers replace Java workers, achieving roughly ten‑fold speedups on CPU‑intensive TPC‑H queries and 3‑6× on shuffle‑sensitive queries, and a 6‑7× average gain on real‑world interactive workloads; Spark integration via Intel’s Gluten project using a JNI API and Arrow/Substrait to offload execution to Velox; and TorchArrow, which converts PyTorch data‑frame operations into Velox plans, unifying analytics and ML pipelines.
Since its open‑source launch, Velox has attracted over 150 contributors from industry and academia, and its modular design is intended to blur the line between ML infrastructure and traditional data management, encouraging broader adoption and faster innovation in the database ecosystem.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Past Memory Big Data
A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
