Big Data 14 min read

Presto Implementation and Practice at YouZan: A Big Data Query Engine Journey

The article outlines Presto’s high‑performance, coordinator‑worker architecture and query flow, describes YouZan’s migration from mixed Hadoop deployment to dedicated low‑latency clusters, details challenges such as small‑file handling and regex backtracking with their fixes, and previews future enhancements like Alluxio integration, session property managers, and Ranger‑based multi‑tenant isolation.

Youzan Coder
Youzan Coder
Youzan Coder
Presto Implementation and Practice at YouZan: A Big Data Query Engine Journey

This article introduces Presto, an open-source high-performance distributed SQL query engine developed by Facebook, and its practical implementation at YouZan.

1. Presto Overview

Presto was developed to address Hive's limitations in interactive query scenarios. While Hive is designed for batch processing with high latency, Presto is specifically optimized for interactive queries, providing minute-level or even sub-second low-latency query performance.

1.1 Presto Architecture

Presto uses a Coordinator-Worker architecture with a Coordinator responsible for query parsing, planning, and scheduling, while Workers execute the actual query tasks.

1.2 Query Execution Process

Client sends request to Coordinator

SQL is parsed via ANTLR to generate AST

AST undergoes semantic analysis using metadata

Logical execution plan is generated and optimized through rules

Logical plan is split into different Stages, and Worker nodes are scheduled to generate Tasks

Tasks generate corresponding physical execution plans

Coordinator串联Stages based on scheduling results

Workers execute the physical execution plans

Client continuously pulls query results from Coordinator, which pulls from the final aggregating Worker node

1.3 Why Presto is High-Performance

Pipeline, full in-memory computation

SQL query plan rule optimization

Dynamic code generation technology

Data scheduling localization, focusing on memory overhead efficiency, optimizing data structures, caching, and approximate query techniques

2. Presto Usage Scenarios at YouZan

Data Platform (DP) temporary queries: Exploratory data analysis with desensitization and audit features

BI reporting engine: Analytical reports for merchants

Metadata data quality validation

Data products: CRM data analysis, crowd profiling

3. Presto Evolution at YouZan

Phase 1: Mixed Deployment with Hadoop

Initially, Presto was deployed together with the Hadoop offline cluster. However, users complained about unstable performance due to disk I/O bandwidth being saturated by Hadoop offline tasks.

Phase 2: Independent Presto Cluster

YouZan created a separate Presto cluster with dedicated HDFS environment. They used a shared Hive metadata store, creating a new database that points to the Presto cluster's HDFS NameService.

Phase 3: Low-Latency Dedicated Presto Cluster

For businesses requiring very low response times (under 3 seconds, sometimes even 1 second), dedicated clusters were deployed with full resource isolation and local HDFS. Configuration tuning like setting Task Concurrency to 1 improved performance in high-concurrency scenarios.

4. Problems Encountered and Solutions

4.1 HDFS Small File Problem

Small files caused slow queries due to Presto's split limits:

node-scheduler.max-splits-per-node=100
node-scheduler.max-pending-splits-per-task=10

Solution: Increased these parameters and introduced Adaptive Spark and small file merging tools in the ETL layer.

4.2 Regex Exponential Backtracking

A user's regex caused exponential backtracking, running for over 1 hour. Solution: Configured Presto to use Google RE2J and added query maximum runtime limits.

4.3 Multiple Column DISTINCT Problem

Queries with multiple count distinct columns were slow. While Spark, Hive TEZ, and Calcite optimize this using grouping sets, Presto had not implemented this optimization. YouZan submitted issues and PRs to the community to address this.

4.4 HDFS Namenode Causing Occasional Slow Queries

During NameNode Edit Log Rolling, read requests were blocked by write locks, causing 1-second delays. Potential solutions include using Uber's Observer NameNode, Alluxio, or SSD storage for NameNode disks.

5. Future Outlook

5.1 Presto + Alluxio

Alluxio provides fine-grained memory control, making IO times fast and consistent. While single-task sequential disk read can reach 150MB/s (making CPU the bottleneck), multiple parallel tasks can create disk I/O bottlenecks, where Alluxio helps stabilize performance.

5.2 Presto Session Property Managers

New Presto versions implement session property managers for different workloads, allowing optimal configurations for different business SQL types and data volumes.

5.3 Presto Multi-Tenant Isolation

YouZan plans to integrate with Apache Ranger for multi-tenant data isolation through SQL rewriting.

performance optimizationBig DataFacebookdistributed computingHDFSprestoYouzanSQL Query Engine
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.