Step-by-Step Guide: Integrating Presto with Velox on macOS (Build, Configure, and Run)
This article walks through the performance bottleneck of CPU in data analytics, introduces the Velox vectorized execution engine, and provides a detailed, zero‑to‑one tutorial for downloading Presto source, syncing Velox, fixing build paths, compiling both Java and C++ components, configuring CLion and IntelliJ, launching the servers, and executing SQL queries while noting stability concerns.
Over the past decade, storage speeds have risen from 50 MB/s (HDD) to 16 GB/s (NVMe) and network speeds from 1 Gbps to 100 Gbps, but CPU clock rates have stagnated around 3 GHz, making CPU the main bottleneck for data analytics. To address this, many vectorized execution engines have been created, such as Photon, ClickHouse, Apache Doris, Intel Gazelle, and Facebook's Velox.
Using Velox with Presto
Velox is a unified execution engine written in C++ that can be integrated with many compute engines. Within Facebook, Velox is integrated with Presto (project name Prestissimo, open‑source), Spark (project Spruce, not open‑source), and other systems. Because Presto is Java‑based and Velox is C++‑based, direct calls are impossible; Facebook created the Prestissimo project to provide a C++ implementation of Presto's HTTP REST interface, handling worker‑worker serialization, coordinator‑worker orchestration, and status endpoints. Prestissimo receives a Presto plan fragment from the Java coordinator, converts it to a Velox plan, and executes it.
Code Download and Compilation
Note: The steps below are demonstrated on an Apple M1 Pro running macOS Monterey; other platforms may differ.
Download the Presto source code:
git remote add upstream https://github.com/prestodb/presto.git
git fetch upstream
git checkout upstream/master
cd presto
./mvnw clean install -DskipTests -T12Sync the Velox submodule:
cd presto
make -C presto-native-execution submodules
git submodule sync --recursive
git submodule update --init --recursive
# The output shows Velox submodule checked out at commit 2c7eea574d3d7c3d3307528b08c67a77f4636f99Initialize dependencies (install fizz, thrift, antlr, glog, etc.) by running the provided script:
cd presto-native-execution
sudo chown -R $(whoami) /usr/local/{bin,lib,sbin}
chmod u+w /usr/local/{bin,lib,sbin}
./scripts/setup-macos.shCompile Velox:
cd velox
make debugThe build creates _build/debug with the compiled libraries.
Compile Prestissimo (the C++ Presto server):
cd presto-native-execution
make debugThe build initially fails because the Thrift headers are not found.
Fix the missing include path by adding /usr/local/include to the CMake include directories (edit presto-native-execution/CMakeLists.txt): include_directories(SYSTEM /usr/local/include) Re‑run make debug and the compilation succeeds, producing presto_cpp/main/presto_server .
Launching Java and C++ PrestoServers
There are two ways to start the C++ PrestoServer:
Manually, via an IDE such as CLion or directly from the command line.
Automatically, when launching the Java PrestoServer.
Manual Launch with CLion
Open the presto-native-execution project in CLion, then set the following CMake options:
-DTREAT_WARNINGS_AS_ERRORS=1 -DENABLE_ALL_WARNINGS=1 -DCMAKE_PREFIX_PATH="/usr/local" -DPRESTO_ENABLE_PARQUET="OFF" -GNinja -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DVELOX_BUILD_TESTING=ON -DCMAKE_BUILD_TYPE=DebugSet the build directory to _build/debug and apply the changes.
After reloading the CMake project, create a Run/Debug configuration for the presto_server target with program arguments:
--logtostderr=1 --v=1 --etc_dir=/path/to/presto-native-execution/etcSet the working directory to the presto-native-execution root and run the configuration. The server logs “Announcement succeeded: 202”, indicating a successful start. Adjust http-server.http.port in etc/config.properties to run multiple servers.
Automatic Launch from Java
Create an IntelliJ Application Run/Debug configuration (e.g., HiveExternalWorkerQueryRunner) with:
Main class: com.facebook.presto.hive.HiveExternalWorkerQueryRunner VM options:
-ea -Xmx5G -XX:+ExitOnOutOfMemoryError -Duser.timezone=America/Bahia_Banderas -Dhive.security=legacyEnvironment variables:
PRESTO_SERVER=/path/to/presto_cpp/main/presto_server;DATA_DIR=/path/to/data;WORKER_COUNT=2Use classpath of the presto-native-execution module.
Running this configuration starts the Java PrestoServer and spawns the specified number of C++ workers.
Running SQL Queries
With both servers running, launch the Presto CLI:
presto-cli/target/presto-cli-*-executable.jar --catalog hive --schema tpchExamples:
presto:tpch> show schemas;
presto:tpch> use tpch;
presto:tpch> show tables;
presto:tpch> select count(*) from customer;
presto:tpch> select count(*) from lineitem;The query planning and execution plan generation happen in the Java server, while the actual execution runs on the C++ server. Users may observe occasional crashes of the C++ server during testing.
Observations
In practice, the C++ PrestoServer can be unstable, frequently throwing exceptions and terminating. Nevertheless, Velox shows promise as a reusable vectorized engine for Presto and other compute frameworks. Both Velox and Prestissimo are still evolving, and production‑ready stability may require more time.
To start the C++ server directly from the terminal: <code>/Users/iteblog/data/code/apache/presto/presto-native-execution/_build/debug/presto_cpp/main/presto_server --logtostderr=1 --v=1 --etc_dir=/Users/iteblog/data/code/apache/presto/presto-native-execution/etc</code> Successful startup is indicated by the log line “Announcement succeeded: 202”.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Past Memory Big Data
A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
