Big Data 18 min read

How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations

Didi’s three‑year journey with Presto transformed it into the company’s primary ad‑hoc and Hive‑SQL acceleration engine, serving over 6 000 users, processing 2‑3 PB of HDFS data daily, and achieving major gains in stability, performance, cost, and usability through extensive architectural tweaks, resource isolation, connector extensions, and monitoring enhancements.

ITPUB

Oct 10, 2020

How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations

Background

Presto is an open‑source massive‑parallel‑processing (MPP) SQL engine originally from Facebook. It separates compute from storage via a master‑slave model (one Coordinator and many Workers) and accesses data through a Connector SPI.

Architecture

The Coordinator parses SQL, generates execution plans and schedules tasks. Workers execute the tasks. Connectors (Hive, MySQL, Kudu, Kafka, etc.) provide access to various data sources, and custom connectors can be added.

Low‑latency design principles

Fully in‑memory parallel computation

Pipelined execution

Data‑local computation

Dynamic compilation of execution plans

Careful memory‑structure usage

GC control

No fault tolerance (pure in‑memory engine)

Business scenarios at Didi

Hive‑SQL query acceleration

Ad‑hoc data‑platform queries

BI and custom reporting

Marketing activities

Data‑quality checks

Asset management

Fixed‑data products

Cluster deployment

Didi runs two types of clusters:

Mixed clusters : share the existing HDFS pool and use label‑based physical isolation to prevent large queries from affecting small ones.

High‑performance clusters : have dedicated HDFS and can also access Druid for real‑time data.

Engine iteration

From September 2017 to early 2019 Didi evaluated Presto 0.192 and 0.215, then migrated to PrestoSQL 340 because the PrestoSQL community showed higher activity, faster issue response, and a roadmap aligned with Didi’s needs.

Hive‑SQL compatibility

To ease migration from Hive, Didi added implicit type conversion, semantic and syntactic compatibility, Hive view support, Parquet/HDFS reading, extensive UDFs, and other features directly in the engine layer.

Physical resource isolation

Labels are configured per business unit; the Coordinator loads label‑to‑machine mappings and schedules tasks only on Workers belonging to the requested label, achieving physical isolation without proliferating clusters.

Druid connector

A native Presto‑on‑Druid connector leverages Druid’s pre‑aggregation, caching, and real‑time capabilities. It pushes down LIMIT, FILTER, PROJECT and AGGREGATION to Druid, providing 4‑5× performance gains for targeted workloads.

Usability enhancements

Multi‑language client support (JDBC, Go, Python, CLI, R, Node.js, HTTP) integrated with Didi’s internal permission system.

Query‑routing gateway that prefers Presto, falls back to Spark or Hive on failure, and classifies queries by size using historical statistics.

Extended SQL syntax (e.g., ADD PARTITION, numeric‑starting table/column names).

Features such as Hive view support, insert‑row‑count tracking in HMS, query‑progress UI, priority‑based scheduling, custom protocol fields for audit logging, and support for DeprecatedLzoTextInputFormat.

Stability construction

Monitoring includes Presto plugins for log audit, JMX metrics exported to Ganglia, logs stored in HDFS/Elasticsearch, and a Kafka‑based metric pipeline. The UI was enhanced to show Worker health and query success/failure rates, enabling alerting based on failure thresholds.

Troubleshooting guide

Common Coordinator issues (OOM from FileSystem cache, Jetty memory leaks, excessive splits, JVM core dumps) and Worker issues (high system load, HDFS client bugs, Young GC exhaustion, ORC Stripe OOM, memory‑kill policies) are documented with mitigation steps such as disabling FileSystem cache, upgrading Jetty, tuning JVM flags, and applying MAT analysis.

Engine optimization & research

JVM tuning (parallel Ref‑Proc) reduced typical query latency from 30‑60 s to 3‑4 s.

ORC optimizations (bloom filters on string columns) improved performance by 20‑30%.

Data‑governance and small‑file merging halved query times for some workloads.

Partition pruning reduced Hive Metastore pressure.

Push‑down of LIMIT, FILTER, PROJECT and AGG achieved further speedups.

Explorations of Presto on Alluxio and Presto on CarbonData were abandoned due to unfavorable memory‑to‑performance ratios and stability concerns; Presto on Druid remains the primary focus for future work.

Results and outlook

Through these efforts Presto became Didi’s primary ad‑hoc engine, serving >6 000 users, reading 2‑3 PB HDFS per day, and delivering 30‑50% average query‑time reductions and up to 10× improvements for specific cases. Average query latency stays under 2 s on high‑performance clusters. Future work includes addressing nighttime resource under‑utilization, extending label‑based scheduling, and open‑sourcing the Druid connector as part of the upgrade to PrestoSQL 340.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Big Data SQL Engine cluster management Druid Connector Hive Compatibility presto

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.