Big Data 17 min read

Presto at Didi: Architecture, Optimizations, and Operational Experience

At Didi, Presto has been the default ad‑hoc and Hive‑SQL engine for over three years, serving 6,000 users, processing 2‑3 PB daily and 30‑35 trillion rows, with mixed and dedicated clusters, migration to PrestoSQL 340, extensive Hive compatibility, label‑based isolation, a native Druid connector, usability and stability enhancements, and JVM‑level performance optimizations, while planning further resource‑saving upgrades.

Didi Tech
Didi Tech
Didi Tech
Presto at Didi: Architecture, Optimizations, and Operational Experience

Presto has been deployed inside Didi for more than three years and has become the default engine for ad‑hoc and Hive‑SQL acceleration. It serves over 6,000 users, reads 2‑3 PB of HDFS data per day and processes 30‑35 trillion rows. To meet the growing business demands, Didi has performed extensive stability, usability, performance, and cost optimizations.

1. Presto Overview

Presto is an open‑source MPP SQL engine originally from Facebook. It separates compute from storage, accesses data through a Connector SPI, and follows a master‑slave architecture with one Coordinator and many Workers.

2. Low‑Latency Design

Fully in‑memory parallel computation

Pipelined execution

Localized processing

Dynamic plan compilation

Careful memory and data‑structure usage

GC control

No fault‑tolerance (pure memory engine)

3. Business Scenarios at Didi

Hive‑SQL query acceleration

Ad‑hoc data platform queries

BI and custom reports

Marketing activities

Data quality checks

Asset management

Fixed‑data products

4. Cluster Deployment

Didi runs both mixed clusters (shared with Hadoop) and high‑performance clusters (dedicated HDFS, Druid integration). Physical isolation is achieved by labeling workers and scheduling queries based on those labels.

5. Engine Iteration

Since 2017 Didi has tracked Presto versions, moving from PrestoDB 0.192/0.215 to PrestoSQL 340 because of higher community activity, better PR response, and a roadmap focused on ad‑hoc queries rather than ETL.

5.1 Hive‑SQL Compatibility

To lower migration cost, Didi added implicit type conversion, semantic and syntax compatibility, Hive view support, Parquet/HDFS reading, extensive UDFs, and other features. The online SQL success rate now reaches 97‑99% and query performance improves 30‑50% on average, with some cases gaining ten‑fold speedups.

5.2 Physical Resource Isolation

Dynamic label‑based scheduling allows high‑load queries to run on dedicated workers, preventing interference with latency‑sensitive workloads.

5.3 Druid Connector

Presto on Druid leverages Druid’s pre‑aggregation, caching, and real‑time capabilities, delivering 4‑5× performance gains for suitable workloads. The native connector avoids the split‑limitation and extra network hop of the community JDBC‑based connector.

5.4 Usability Enhancements

Tenant and permission management via Hadoop SIMPLE authentication and Apache Ranger

Extended SQL syntax (e.g., add partition, numeric‑prefixed tables/columns)

Feature extensions such as insert‑row count tracking, query progress updates, priority control, custom protocol fields, and support for deprecated formats

5.5 Stability Construction

Monitoring includes Presto plugins for audit logs, JMX metrics exported to Ganglia, logs stored in HDFS/ES, and metrics pushed to Kafka. UI improvements expose worker health and query status. Common Coordinator issues (OOM, Jetty memory leaks, split explosion, thread limits) and Worker issues (CPU overload, GC exhaustion, HDFS client bugs, ORC OOM) are diagnosed with MAT, Perf, and JVM tuning.

6. Performance Optimizations

JVM tuning and parallel Ref‑Proc reduced query time from ~30 s to 3‑4 s.

ORC bloom filters and data‑file merging cut query time by 20‑30%.

Partition pruning reduced Hive Metastore pressure.

Push‑down of Limit, Filter, Project, and Agg to storage.

7. Outlook

Future work includes reducing idle resources during off‑peak hours, upgrading the engine to PrestoSQL 340, and open‑sourcing Didi’s Presto‑on‑Druid code.

Performance OptimizationBig Datacluster managementDistributed SQLDruid ConnectorHive CompatibilityPresto
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.