Big Data 16 min read

An Introduction to Presto: Origins, Features, Architecture, and Quick‑Start Deployment Guide

This article explains Presto’s origin as Facebook’s open‑source OLAP engine, outlines its key characteristics, advantages and drawbacks, describes its overall architecture and query flow, and provides a step‑by‑step guide for downloading, configuring, and launching a Presto cluster for fast interactive analytics.

政采云技术
政采云技术
政采云技术
An Introduction to Presto: Origins, Features, Architecture, and Quick‑Start Deployment Guide

1. What Is Presto

1.1 Origin

Presto is an open‑source OLAP query engine originally developed by Facebook to support internal data‑analysis experiments. Before Presto, Facebook relied on Hive on MapReduce, which could not meet interactive query requirements. After evaluating external projects, Facebook decided in 2012 to build its own solution, releasing Presto as open source in 2013.

There are two main branches today: PrestoDB, led by Facebook for internal enterprise needs, and Trino, driven by the original founders and the Presto Foundation as a more general‑purpose version.

1.2 Use Cases

Presto is designed for high‑speed, real‑time analytical queries, making it suitable for data‑analysis, reporting, and low‑latency query scenarios.

2. Features, Advantages and Disadvantages

2.1 Features

Completely in‑memory distributed query engine; it does not store data itself, so all computation happens in memory, avoiding disk I/O.

Supports multiple data sources (Hive, MySQL, PostgreSQL, Kafka, etc.) and federated queries across them.

SQL‑compatible query language, user‑friendly.

Optimizes SQL execution plans and leverages distributed execution for concurrency.

Provides a visual pipeline of data processing, allowing users to see the entire flow.

2.2 Advantages

Handles petabyte‑scale data with in‑memory computation, reducing disk I/O and speeding up queries.

Although memory‑based, it reads data incrementally and releases memory after processing.

Supports many data sources and federated queries.

Highly extensible and flexible, with rich functions and operators for custom development.

2.3 Disadvantages

High memory consumption due to in‑memory processing.

Complex multi‑table joins can generate large temporary data, affecting performance.

The Coordinator (master) is a single point of failure; if it goes down, the whole cluster becomes unavailable.

Any Worker failure aborts the entire query, resulting in limited fault tolerance.

Slow Workers can become bottlenecks, slowing down the whole query.

3. Overall Architecture

3.1 Architecture Overview

3.1.1 SQL Client

The standard client is the Presto CLI JAR. PrestoDB requires Java 8+, while Trino requires Java 11+.

The CLI provides a command‑line interface for submitting SQL to the Presto cluster.

3.1.2 Coordinator

The Coordinator is the master node in Presto’s master‑slave architecture. It parses SQL, creates execution plans, splits them into stages and tasks, and dispatches tasks to Workers.

3.1.3 Worker

Workers execute tasks and process data. They obtain data via Connectors (through SPI) and exchange intermediate results with other Workers. The Coordinator gathers final results from Workers and returns them to the SQL client.

3.1.4 Connector

Presto abstracts storage layers via plug‑in Connectors, enabling it to access Hive, MySQL, PostgreSQL, Kafka, and other sources. Users can also develop custom Connectors.

3.1.5 Discovery Service

The Discovery Service coordinates the Coordinator and Workers. Workers register themselves with the service; the Coordinator queries it to obtain available Worker information.

3.2 Query Execution Process

The user submits SQL through the SQL client, which streams results back to the terminal.

Metadata and actual data are fetched from external storage systems (e.g., Hive Metastore, HDFS) because Presto stores no data itself.

The Coordinator receives the query, creates an execution plan, splits it into stages and tasks, and schedules them on Workers.

Workers execute HTTP remote tasks, retrieve data via Connectors, pass intermediate results downstream, and finally return the aggregated result to the Coordinator, which sends it back to the client.

4. Presto in the Government Procurement Cloud Data Platform

4.1 Data Gateway

More than 40% of the platform’s external interfaces use Presto as the query engine.

4.2 Data Visualization Platform

Reports, dashboards, and other visualizations rely on Presto for flexible, custom query configurations.

4.3 BI Analysis, Data Extraction and Dashboards

Metabase uses Presto for the majority of its queries, dashboards, and data extraction tasks.

5. Quick‑Start Guide

Deploying Presto locally is straightforward. The steps below use the PrestoDB distribution.

5.1 Download and Extract

Download the presto-server package from the official site, extract it to a bigdata directory, then download the presto-cli JAR, rename it to presto , and make it executable.

# presto-server
mkdir bigdata
mv presto-server-xxx.tar.gz bigdata
cd bigdata
tar -zxf presto-server-xxx.tar.gz

# presto-cli
mv presto-cli-xxx-executable.jar presto-server-xxx
cd presto-server-xxx
mv presto-cli-xxx-executable.jar presto
sudo chmod +x presto

5.2 Configuration

Create an /etc directory inside the server folder and add the following configuration files.

5.2.1 node.properties

# Environment name – must be identical across the cluster
node.environment=production
# Unique identifier for each node
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
# Directory for logs and other data
node.data-dir=/{localhost}/bigdata/data/presto

5.2.2 config.properties

# Enable this instance as a Coordinator
coordinator=true
# Allow the Coordinator to act as a Worker (not recommended for large clusters)
node-scheduler.include-coordinator=true
# HTTP port
http-server.http.port=8080
# Maximum distributed memory per query
query.max-memory=5GB
# Maximum memory per node for a query
query.max-memory-per-node=1GB
# Maximum total memory per node (user + system)
query.max-total-memory-per-node=2GB
# Enable discovery service
discovery-server.enabled=true
# Coordinator URI
discovery.uri=http://localhost:8080

5.2.3 jvm.config

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

5.2.4 log.properties

# Log level (INFO, DEBUG, ERROR)
com.facebook.presto=INFO

5.2.5 catalog Directory

The catalog folder holds connector configurations. Example files for Hive and MySQL are shown below.

hive.properties

connector.name=hive-hadoop2
hive.metastore.uri=thrift://{hive metastore}:9083
hive.config.resources=/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

mysql.properties

connector.name=mysql
connection-url=jdbc:mysql://{localhost}:3306
connection-user={userName}
connection-password={password}

5.3 Start the Service

Use the launcher script under /presto-server/bin to start, stop, or restart Presto.

cd presto-server-xxx
sudo ./bin/launcher start

After starting, the UI is available at http://localhost:8080/ui/ .

Launch the client from the server directory:

sudo ./presto

To stop, view logs, or restart:

# Stop the service
sudo ./bin/launcher stop
# View logs (located in /var/run)
sudo ./bin/launcher run
# Restart the service
sudo ./bin/launcher restart

Following these steps completes a basic Presto deployment ready for interactive queries.

Reference: Presto: The Definitive Guide (Matt Fuller, Manfred Moser & Martin Traverso).

Big DataSQLConnectorDeploymentData WarehouseprestoDistributed Query Engine
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.