Comprehensive Guide to Presto: Origins, Architecture, Optimization, and Real‑World Applications
This article provides an in‑depth overview of Presto, covering its history, core principles, architectural components, query optimization techniques, resource management, tuning tips, data model, and case studies from companies like Didi and Youzan, offering practical guidance for deploying and operating the distributed SQL engine at scale.
1. Origin
Presto was open‑sourced by Facebook as an MPP (Massively Parallel Processing) engine to enable low‑latency interactive analytics on massive Hadoop data warehouses. The original Facebook version, often called PrestoDB, uses version numbers like 0.xxx. A later community‑driven fork, initially named Presto SQL and now renamed Trino, shares the same core mechanisms.
Who am I? Where do I come from? Where am I going?
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Presto allows querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.
Presto is targeted at analysts who expect response times ranging from sub‑second to minutes. Presto breaks the false choice between having fast analytics using an expensive commercial solution or using a slow "free" solution that requires excessive hardware.The official definition emphasizes that Presto is a distributed SQL engine supporting many data sources (HDFS, RDBMS, Kafka, etc.) with a friendly connector API.
2. Features and Use Cases
1. Features
Multi‑tenant: supports hundreds of concurrent memory‑, I/O‑, and CPU‑intensive queries and can scale to thousands of nodes.
Federated queries: developers can write custom connectors to query across heterogeneous data sources.
Internal optimizations: runs on the JVM, uses code generation, and other performance tricks.
2. Typical Scenarios
Interactive analytics – real‑time ad‑hoc queries across Hive, MySQL, etc.
Batch ETL workloads.
Facebook’s A/B testing infrastructure.
3. Overall Architecture
Presto consists of four main components: Client, Coordinator, Workers, and Connectors.
SQL Submission
Clients submit SQL via JDBC or CLI; the Coordinator receives the query and begins processing.
Parsing and Planning
The Coordinator performs lexical and syntactic analysis to build an abstract syntax tree (AST) and then generates a logical query plan.
Logical Plan Generation
The logical plan describes the operations without physical execution details.
Query Optimization
The Coordinator applies optimizations such as predicate push‑down, adaptive data layout selection, and shuffle reduction.
Fault Tolerance
Presto relies on client‑side retry for transient errors; full checkpointing is not yet mature.
4. Resource and Scheduling
Presto follows a master‑slave model: one Coordinator, a Discovery Server (often embedded), and many Workers.
The Coordinator parses SQL, creates execution stages, schedules tasks to Workers, and aggregates results. Workers execute tasks, fetch data via Connectors, and send results back.
Stage and Task Scheduling
Presto supports two stage‑scheduling strategies: All‑at‑once (parallel) and Phased (dependency‑ordered). Task scheduling distinguishes leaf stages (data fetch) from intermediate stages (data processing).
Split Allocation
Leaf tasks receive one or more splits, which are lightweight descriptors of data fragments (e.g., file offsets). Splits are lazily loaded to reduce I/O and memory pressure.
5. Presto Tuning
Partitioning : Align partitions with query filters to reduce data scanned.
Columnar Storage : Use ORC files for better read performance.
Compression : Prefer Snappy for fast decompression.
Memory Pools (GENERAL_POOL, SYSTEM_POOL, RESERVED_POOL) are configured via config.properties and resources.reserved-system-memory. Example code:
builder.put(RESERVED_POOL, new MemoryPool(RESERVED_POOL, config.getMaxQueryMemoryPerNode()));
builder.put(SYSTEM_POOL, new MemoryPool(SYSTEM_POOL, systemMemoryConfig.getReservedSystemMemory()));
long maxHeap = Runtime.getRuntime().maxMemory();
maxMemory = new DataSize(maxHeap - systemMemoryConfig.getReservedSystemMemory().toBytes(), BYTE);
DataSize generalPoolSize = new DataSize(Math.max(0, maxMemory.toBytes() - config.getMaxQueryMemoryPerNode().toBytes()), BYTE);
builder.put(GENERAL_POOL, new MemoryPool(GENERAL_POOL, generalPoolSize));Common memory‑related errors and remedies are listed (e.g., increase query.max-total-memory-per-node).
Parallelism : Increase task.concurrency to raise parallel task execution.
SQL Optimizations include selecting only needed columns, pushing predicates to partitions, ordering GROUP BY fields by cardinality, using LIMIT with ORDER BY, employing approximate aggregation functions, replacing multiple LIKE with regexp_like, placing the larger table on the right side of a broadcast join, preferring RANK over row_number, using UNION ALL, and leveraging CTEs ( WITH).
Metadata Caching and connector‑specific parameters can further improve performance.
6. Presto Data Model
Presto uses a three‑level hierarchy similar to MySQL:
Catalog : Represents a data source (e.g., Hive, MySQL).
Schema : Equivalent to a database within the catalog.
Table : The actual table inside a schema.
Fully qualified table name looks like hive.testdata.test (catalog.schema.table).
Internally, data is stored in Pages (column‑oriented blocks) and Blocks (single column encoding). A Page is up to 1 MB and contains up to 16 K rows.
7. Why is Presto Fast?
Pure‑in‑memory parallel computation.
Pipelined execution.
Data locality.
Dynamic code generation for execution plans.
Efficient memory and data structures.
Approximate query techniques (BlinkDB‑style).
GC control.
Unlike Hive, Presto does not require YARN scheduling or intermediate disk writes, enabling sub‑second response times for many queries.
8. Industry Case Studies
Didi adopted Presto for three years, using it as the primary ad‑hoc query engine and Hive‑SQL accelerator across BI, marketing, data quality, and asset management workloads. They built multiple client libraries (JDBC, Go, Python, R, Node.js, HTTP) and a query‑routing gateway that falls back to Spark or Hive when Presto fails.
Youzan uses Presto for temporary queries, BI reporting, metadata quality checks, and data products. Their migration path involved three stages: mixed Presto‑Hadoop deployment, fully independent Presto clusters, and low‑latency dedicated clusters with strict SLA (≤3 s). They tuned resource groups, JVM settings, and split limits to handle small‑file problems and high‑cardinality distinct queries.
9. Summary
Learning Presto involves reading documentation, experimenting with configuration, and applying best‑practice tuning. The engine’s speed stems from in‑memory parallelism, pipelining, and adaptive optimization, but operational stability requires careful resource management, connector configuration, and monitoring.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
