Databases 9 min read

Velox: An Open‑Source Unified Execution Engine for Data Systems

Velox is Meta's open‑source unified execution engine that consolidates common data‑intensive components, integrates with engines like Presto, Spark, and TorchArrow, and delivers up to ten‑fold speedups on CPU‑bound queries while simplifying development and fostering a reusable, community‑driven ecosystem.

Past Memory Big Data
Past Memory Big Data
Past Memory Big Data
Velox: An Open‑Source Unified Execution Engine for Data Systems

Meta introduced Velox, an open‑source unified execution engine designed to accelerate data‑management systems and simplify their development by addressing the fragmentation of many specialized compute engines.

Velox provides core building blocks shared by data engines—type system, columnar vector layout compatible with Apache Arrow, a vectorized expression evaluator, extensible functions, common SQL operators, and I/O interfaces (connectors, storage adapters, serializers) plus resource‑management primitives—allowing diverse engines to reuse optimized implementations.

Key integrations include Presto (Prestissimo), where C++ Velox workers replace Java workers, achieving roughly ten‑fold speedups on CPU‑intensive TPC‑H queries and 3‑6× on shuffle‑sensitive queries, and a 6‑7× average gain on real‑world interactive workloads; Spark integration via Intel’s Gluten project using a JNI API and Arrow/Substrait to offload execution to Velox; and TorchArrow, which converts PyTorch data‑frame operations into Velox plans, unifying analytics and ML pipelines.

Since its open‑source launch, Velox has attracted over 150 contributors from industry and academia, and its modular design is intended to blur the line between ML infrastructure and traditional data management, encouraging broader adoption and faster innovation in the database ecosystem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceData ManagementprestoSparkveloxUnified Execution Engine
Past Memory Big Data
Written by

Past Memory Big Data

A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.