Big Data 33 min read

Comprehensive Guide to Presto: Origins, Architecture, Optimization, and Real‑World Applications

This article provides an in‑depth overview of Presto, covering its history, core principles, architectural components, query optimization techniques, resource management, tuning tips, data model, and case studies from companies like Didi and Youzan, offering practical guidance for deploying and operating the distributed SQL engine at scale.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Comprehensive Guide to Presto: Origins, Architecture, Optimization, and Real‑World Applications

1. Origin

Presto was open‑sourced by Facebook as an MPP (Massively Parallel Processing) engine to enable low‑latency interactive analytics on massive Hadoop data warehouses. The original Facebook version, often called PrestoDB, uses version numbers like 0.xxx. A later community‑driven fork, initially named Presto SQL and now renamed Trino, shares the same core mechanisms.

Who am I? Where do I come from? Where am I going?

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Presto allows querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.
Presto is targeted at analysts who expect response times ranging from sub‑second to minutes. Presto breaks the false choice between having fast analytics using an expensive commercial solution or using a slow "free" solution that requires excessive hardware.

The official definition emphasizes that Presto is a distributed SQL engine supporting many data sources (HDFS, RDBMS, Kafka, etc.) with a friendly connector API.

2. Features and Use Cases

1. Features

Multi‑tenant: supports hundreds of concurrent memory‑, I/O‑, and CPU‑intensive queries and can scale to thousands of nodes.

Federated queries: developers can write custom connectors to query across heterogeneous data sources.

Internal optimizations: runs on the JVM, uses code generation, and other performance tricks.

2. Typical Scenarios

Interactive analytics – real‑time ad‑hoc queries across Hive, MySQL, etc.

Batch ETL workloads.

Facebook’s A/B testing infrastructure.

3. Overall Architecture

Presto Architecture
Presto Architecture

Presto consists of four main components: Client, Coordinator, Workers, and Connectors.

SQL Submission

Clients submit SQL via JDBC or CLI; the Coordinator receives the query and begins processing.

Parsing and Planning

The Coordinator performs lexical and syntactic analysis to build an abstract syntax tree (AST) and then generates a logical query plan.

SQL Parsing
SQL Parsing

Logical Plan Generation

The logical plan describes the operations without physical execution details.

Logical Plan
Logical Plan

Query Optimization

The Coordinator applies optimizations such as predicate push‑down, adaptive data layout selection, and shuffle reduction.

Optimized Plan
Optimized Plan

Fault Tolerance

Presto relies on client‑side retry for transient errors; full checkpointing is not yet mature.

4. Resource and Scheduling

Presto follows a master‑slave model: one Coordinator, a Discovery Server (often embedded), and many Workers.

Cluster Components
Cluster Components

The Coordinator parses SQL, creates execution stages, schedules tasks to Workers, and aggregates results. Workers execute tasks, fetch data via Connectors, and send results back.

Client‑Coordinator‑Worker Communication
Client‑Coordinator‑Worker Communication

Stage and Task Scheduling

Presto supports two stage‑scheduling strategies: All‑at‑once (parallel) and Phased (dependency‑ordered). Task scheduling distinguishes leaf stages (data fetch) from intermediate stages (data processing).

Split Allocation

Leaf tasks receive one or more splits, which are lightweight descriptors of data fragments (e.g., file offsets). Splits are lazily loaded to reduce I/O and memory pressure.

5. Presto Tuning

Partitioning : Align partitions with query filters to reduce data scanned.

Columnar Storage : Use ORC files for better read performance.

Compression : Prefer Snappy for fast decompression.

Memory Pools (GENERAL_POOL, SYSTEM_POOL, RESERVED_POOL) are configured via config.properties and resources.reserved-system-memory. Example code:

builder.put(RESERVED_POOL, new MemoryPool(RESERVED_POOL, config.getMaxQueryMemoryPerNode()));
builder.put(SYSTEM_POOL, new MemoryPool(SYSTEM_POOL, systemMemoryConfig.getReservedSystemMemory()));
long maxHeap = Runtime.getRuntime().maxMemory();
maxMemory = new DataSize(maxHeap - systemMemoryConfig.getReservedSystemMemory().toBytes(), BYTE);
DataSize generalPoolSize = new DataSize(Math.max(0, maxMemory.toBytes() - config.getMaxQueryMemoryPerNode().toBytes()), BYTE);
builder.put(GENERAL_POOL, new MemoryPool(GENERAL_POOL, generalPoolSize));

Common memory‑related errors and remedies are listed (e.g., increase query.max-total-memory-per-node).

Parallelism : Increase task.concurrency to raise parallel task execution.

SQL Optimizations include selecting only needed columns, pushing predicates to partitions, ordering GROUP BY fields by cardinality, using LIMIT with ORDER BY, employing approximate aggregation functions, replacing multiple LIKE with regexp_like, placing the larger table on the right side of a broadcast join, preferring RANK over row_number, using UNION ALL, and leveraging CTEs ( WITH).

Metadata Caching and connector‑specific parameters can further improve performance.

6. Presto Data Model

Presto uses a three‑level hierarchy similar to MySQL:

Catalog : Represents a data source (e.g., Hive, MySQL).

Schema : Equivalent to a database within the catalog.

Table : The actual table inside a schema.

Fully qualified table name looks like hive.testdata.test (catalog.schema.table).

Internally, data is stored in Pages (column‑oriented blocks) and Blocks (single column encoding). A Page is up to 1 MB and contains up to 16 K rows.

7. Why is Presto Fast?

Pure‑in‑memory parallel computation.

Pipelined execution.

Data locality.

Dynamic code generation for execution plans.

Efficient memory and data structures.

Approximate query techniques (BlinkDB‑style).

GC control.

Unlike Hive, Presto does not require YARN scheduling or intermediate disk writes, enabling sub‑second response times for many queries.

8. Industry Case Studies

Didi adopted Presto for three years, using it as the primary ad‑hoc query engine and Hive‑SQL accelerator across BI, marketing, data quality, and asset management workloads. They built multiple client libraries (JDBC, Go, Python, R, Node.js, HTTP) and a query‑routing gateway that falls back to Spark or Hive when Presto fails.

Youzan uses Presto for temporary queries, BI reporting, metadata quality checks, and data products. Their migration path involved three stages: mixed Presto‑Hadoop deployment, fully independent Presto clusters, and low‑latency dedicated clusters with strict SLA (≤3 s). They tuned resource groups, JVM settings, and split limits to handle small‑file problems and high‑cardinality distinct queries.

9. Summary

Learning Presto involves reading documentation, experimenting with configuration, and applying best‑practice tuning. The engine’s speed stems from in‑memory parallelism, pipelining, and adaptive optimization, but operational stability requires careful resource management, connector configuration, and monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

optimizationSQLResource ManagementPrestoQuery Engine
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.