Big Data 28 min read

Design, Optimization, and Future Roadmap of Bilibili's Presto SQL‑on‑Hadoop Architecture

This article details Bilibili's end‑to‑end Presto‑based SQL‑on‑Hadoop architecture, covering overall system components, query routing, Presto feature set, extensive stability and availability enhancements, performance boosts through caching and multi‑datacenter deployment, and outlines future development plans.

Architect

Apr 11, 2022

The offline platform at Bilibili is built on three core compute engines—Presto, Spark, and Hive—combined with HDFS storage and Yarn scheduling. A custom Dispatcher routes SQL queries to the most suitable engine based on syntax, data size, and load, while a Presto gateway manages multi‑cluster routing and integrates with Ranger for fine‑grained security.

Presto, originally open‑sourced by Facebook, provides an in‑memory MPP engine with shuffle‑free execution, thread‑level split scheduling, and pluggable data sources. It excels at interactive cross‑source queries but is less suited for large‑scale ETL workloads.

Key improvements include:

Coordinator multi‑active redesign with global state stored in Redis and a discovery service, eliminating the single‑point‑of‑failure.

Label‑based resource isolation and a real‑time punishment mechanism that throttles resource groups exceeding CPU quotas.

Gateway‑level query limiting, feature extraction, and bad‑SQL interception to protect cluster stability.

Integration of Hive UDFs, INSERT OVERWRITE syntax, and Hive Ranger policies for unified access control.

Alluxio caching of hot partitions, resulting in 20‑30% query speedup and more stable RPC latency.

RaptorX local page‑level cache that reduces I/O and improves performance for repeated queries.

Dynamic filter support (broadcast join, partition pruning) that cuts data read from 6.36 TB to 358 GB in benchmark cases.

Additional performance tweaks include small‑file split merging, Observer‑NN usage, FileStatus caching, spill‑to‑disk, plan caching, phased multi‑stage execution, and enhanced CBO statistics handling.

Future roadmap focuses on auto‑scaling with HPA, heuristic indexing, automatic materialized view creation, complex type (Array/Map) read optimizations, ETL routing to Presto‑on‑Spark, and further stability enhancements.

Code examples from the implementation:

long cSum = lastCSum + usagePerSecond;</code>
<code>if (cSum <= punishCpuLimit) { cSum = 0; }</code>
<code>else if (cSum >= 2 * punishCpuLimit) { /* record group for punishment */ cSum = cSum - punishCpuLimit; }</code>
<code>else if (punishCpuLimit < cSum && cSum < 2 * punishCpuLimit) { cSum = cSum - punishCpuLimit; }

These enhancements collectively improve query latency, resource utilization, and operational reliability for Bilibili's massive data workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization SQL Kubernetes presto Hadoop

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.