Tagged articles
86 articles
Page 1 of 1
vivo Internet Technology
vivo Internet Technology
Mar 25, 2026 · Industry Insights

How Vivo Scaled Marketing Automation with Presto, Bitmap, and StarRocks

This case study details how Vivo’s marketing automation platform evolved its data‑driven architecture—from a Presto‑based wide‑table design, through a Bitmap optimization, to a StarRocks migration—addressing performance bottlenecks, reducing resource costs, and enhancing data security.

Big DataBitmapData Architecture
0 likes · 11 min read
How Vivo Scaled Marketing Automation with Presto, Bitmap, and StarRocks
Shopee Tech Team
Shopee Tech Team
Oct 25, 2024 · Big Data

StarRocks at Shopee: Practical Use Cases and Performance Analysis

Shopee’s deployment of StarRocks across DataService, DataGo, and DataStudio demonstrates that its vectorized engine, cost‑based optimizer, and materialized‑view caching can query Hive, Iceberg, Delta Lake and Hudi up to 20,000× faster than Presto, cutting CPU usage and delivering consistently lower latency for complex analytics.

Data LakeMPPPresto
0 likes · 11 min read
StarRocks at Shopee: Practical Use Cases and Performance Analysis
DataFunSummit
DataFunSummit
Aug 11, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage cost models affect performance optimization, and presents case‑study findings from Uber’s Presto production environment that highlight fragmented I/O patterns and the financial impact of storage API calls.

Cost ModelI/O optimizationPresto
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
DataFunTalk
DataFunTalk
Aug 11, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native environments, revealing how cloud storage cost models affect I/O optimization, and presents Uber Presto case‑study findings that highlight fragmented access patterns and financial implications of storage API calls.

Cost ModelI/O optimizationPresto
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
DataFunSummit
DataFunSummit
Aug 10, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper investigates the migration of data‑intensive analytics to cloud‑native environments, using Uber’s Presto workload to expose how cloud storage cost models and fragmented I/O patterns affect performance, and proposes optimized I/O strategies to improve cost‑effectiveness and system design.

Cloud NativeCost ModelI/O optimization
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
DataFunSummit
DataFunSummit
Aug 7, 2024 · Cloud Native

Optimizing I/O for Data-Intensive Analytics in Cloud-Native Environments: A Case Study of Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native environments, analyzing how cloud storage cost models affect performance optimization, and presents an Uber Presto case study that reveals fragmented I/O patterns and proposes cost‑effective optimization strategies.

Cloud NativeI/O optimizationPresto
0 likes · 3 min read
Optimizing I/O for Data-Intensive Analytics in Cloud-Native Environments: A Case Study of Uber Presto
DataFunTalk
DataFunTalk
Aug 2, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber’s Presto Deployment

This whitepaper examines the trend of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage cost models affect I/O optimization, and using Uber’s Presto production data to show that traditional I/O strategies overlook costly storage API calls, leading to high expenses.

Case StudyCost ModelI/O optimization
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber’s Presto Deployment
DataFunTalk
DataFunTalk
Jul 28, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry shift of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage cost models affect I/O optimization and presenting Uber Presto case‑study findings that highlight fragmented access patterns and associated financial impacts.

Case StudyI/O optimizationPresto
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
DataFunSummit
DataFunSummit
Jul 27, 2024 · Cloud Native

Migrating Data‑Intensive Analytics to Cloud‑Native Environments: Cost‑Aware I/O Optimization Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage’s unique cost model demands finer‑grained I/O optimization, illustrated through an empirical case study of Uber’s Presto production environment and its fragmented access patterns.

Case StudyCost ModelData Analytics
0 likes · 3 min read
Migrating Data‑Intensive Analytics to Cloud‑Native Environments: Cost‑Aware I/O Optimization Insights from Uber Presto
DataFunSummit
DataFunSummit
Jul 26, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: A Case Study of Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, analyzing how cloud storage cost models affect performance optimization, and presents a case study of Uber’s Presto production environment that reveals fragmented I/O patterns and the financial impact of storage API calls.

Cost ModelI/O optimizationPresto
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: A Case Study of Uber Presto
DataFunTalk
DataFunTalk
Jul 26, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native environments, analyzes how cloud storage cost models affect performance optimization, and presents Uber Presto case‑study findings that reveal fragmented access patterns and hidden financial costs of traditional I/O strategies.

Case StudyCost ModelI/O optimization
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
DataFunSummit
DataFunSummit
Jun 22, 2024 · Cloud Native

Optimizing I/O for Data-Intensive Analytics in Cloud-Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of migrating data‑intensive analytics workloads to cloud‑native environments, revealing how cloud storage’s unique cost model demands finer‑grained performance optimization, and presents Uber Presto case‑study findings that expose fragmented I/O patterns and associated financial impacts.

Cloud NativeCost ModelData Analytics
0 likes · 3 min read
Optimizing I/O for Data-Intensive Analytics in Cloud-Native Environments: Insights from Uber Presto
DataFunTalk
DataFunTalk
Jun 9, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, analyzes how cloud storage cost models affect performance optimization, and presents Uber Presto case‑study findings that reveal fragmented access patterns and new I/O strategies to improve cost‑effectiveness.

Case StudyCost ModelI/O optimization
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
DataFunSummit
DataFunSummit
Jun 6, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, analyzes the unique cost model of cloud storage, and presents case‑study findings from Uber's Presto production environment to guide efficient I/O design and cost‑effective performance optimization.

Cost ModelI/O optimizationPresto
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
DataFunTalk
DataFunTalk
May 31, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics applications to cloud‑native environments, revealing how cloud storage cost models affect performance optimization, and presents case‑study findings from Uber’s Presto production workload that highlight fragmented I/O patterns and the financial impact of storage API calls.

I/O optimizationPrestocloud-native
0 likes · 3 min read
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
DataFunTalk
DataFunTalk
Sep 9, 2023 · Big Data

Presto + Tencent DOP (Alluxio) Architecture and Optimization Practices for Financial OLAP

This article presents the practical implementation of Presto combined with Tencent DOP (Alluxio) in a financial OLAP scenario, detailing background and architectural evolution, the Presto‑Alluxio design, optimization techniques for caching, storage scalability, ORC handling, and performance results, followed by conclusions and future directions.

AlluxioBig DataOLAP
0 likes · 15 min read
Presto + Tencent DOP (Alluxio) Architecture and Optimization Practices for Financial OLAP
ByteDance Data Platform
ByteDance Data Platform
May 29, 2023 · Databases

Which Open‑Source OLAP Engine Wins the TPC‑DS Benchmark? A Deep Performance Comparison

Using the TPC‑DS benchmark’s 99 queries on a 1 TB dataset, this study evaluates the performance of four open‑source OLAP engines—ClickHouse, Doris, Presto, and ByConity—across basic, join, aggregation, subquery, and window‑function scenarios, revealing ByConity’s superior speed and the limitations of ClickHouse.

ByConityOLAPPresto
0 likes · 12 min read
Which Open‑Source OLAP Engine Wins the TPC‑DS Benchmark? A Deep Performance Comparison
DataFunSummit
DataFunSummit
Mar 20, 2023 · Backend Development

Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions

This article introduces a unified User‑Defined Function (UDF) solution on a cloud data platform, detailing its remote execution architecture, compatibility with Hive UDFs, resource isolation, hot‑update capabilities, internal platform implementation, open‑source contributions to PrestoDB, and future work plans.

Open sourcePrestoServerless
0 likes · 11 min read
Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Mar 17, 2023 · Big Data

How Data Federation Transforms Enterprise Data Integration and Analytics

This article explains the concept of data federation, its advantages over traditional ETL, key architectural components, practical use cases such as virtual ODS, data staging, warehouse extension, heterogeneous migration, and compares Presto and Trino as distributed query engines for unified, secure, and low‑cost data access.

Distributed QueryETL alternativePresto
0 likes · 21 min read
How Data Federation Transforms Enterprise Data Integration and Analytics
DataFunTalk
DataFunTalk
Feb 17, 2023 · Big Data

Tencent Alluxio (DOP) Deployment and Optimization in Financial Data Analytics

This article describes how Tencent's Alluxio-based Data Orchestration Platform (DOP) was applied to financial analytics, detailing the business background, challenges of large‑scale OLAP workloads, the Alluxio architecture and usage modes, performance results, and the series of optimizations and tuning performed to achieve significant speedups.

AlluxioBig DataData Orchestration
0 likes · 15 min read
Tencent Alluxio (DOP) Deployment and Optimization in Financial Data Analytics
DataFunTalk
DataFunTalk
Feb 12, 2023 · Big Data

Optimizing Bilibili Presto Cluster Query Performance with Alluxio and Local Cache

This article presents a comprehensive technical overview of Bilibili's Presto cluster architecture, the challenges of query performance on Hadoop, and the systematic optimizations—including Alluxio integration, local cache mechanisms, multi‑active coordinators, label‑based scheduling, and real‑time penalties—that together improve availability, stability, and latency for large‑scale analytics workloads.

AlluxioBig DataCache
0 likes · 23 min read
Optimizing Bilibili Presto Cluster Query Performance with Alluxio and Local Cache
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 6, 2023 · Big Data

Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations

This article presents an in‑depth overview of real‑time data‑warehouse scenarios, discusses challenges such as timeliness, update efficiency, and resource consumption, and details practical solutions using Apache Hudi, Flink, Presto, and related optimizations for ingestion, indexing, compaction, and query performance.

Big DataData LakeFlink
0 likes · 17 min read
Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations
DataFunSummit
DataFunSummit
Jan 1, 2023 · Big Data

Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans

The Shopee Data Infra talk details the current storage architecture, Presto‑based acceleration with Alluxio caching, service‑oriented storage solutions using Alluxio Fuse and S3 APIs, and outlines future enhancements for Spark/Hive integration and CSI/Fuse optimizations, providing a comprehensive view of large‑scale big data storage engineering.

AlluxioCache ManagerKubernetes
0 likes · 16 min read
Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans
政采云技术
政采云技术
Dec 20, 2022 · Big Data

An Introduction to Presto: Origins, Features, Architecture, and Quick‑Start Deployment Guide

This article explains Presto’s origin as Facebook’s open‑source OLAP engine, outlines its key characteristics, advantages and drawbacks, describes its overall architecture and query flow, and provides a step‑by‑step guide for downloading, configuring, and launching a Presto cluster for fast interactive analytics.

ConnectorDeploymentPresto
0 likes · 16 min read
An Introduction to Presto: Origins, Features, Architecture, and Quick‑Start Deployment Guide
DataFunTalk
DataFunTalk
Nov 19, 2022 · Big Data

Improving Bilibili Offline Cluster Performance with Presto and Alluxio

This technical presentation explains how Bilibili reduced database pressure and query latency in its production environment by integrating Presto with Alluxio, detailing the offline cluster architecture, challenges of compute‑storage separation, caching strategies, consistency mechanisms, performance gains, and future work.

AlluxioCachePresto
0 likes · 17 min read
Improving Bilibili Offline Cluster Performance with Presto and Alluxio
vivo Internet Technology
vivo Internet Technology
Oct 26, 2022 · Big Data

Cardinality Counting in Presto: Algorithms, Implementation, and Best Practices

The article explains cardinality counting in Presto, comparing exact set‑based methods with memory‑efficient bitmap, Linear Count, and HyperLogLog approximations, detailing their algorithms, implementation in Presto’s query engine, and offering best‑practice recommendations for choosing the appropriate technique in business workloads.

BitmapHyperLogLogPresto
0 likes · 16 min read
Cardinality Counting in Presto: Algorithms, Implementation, and Best Practices
DataFunTalk
DataFunTalk
Aug 31, 2022 · Big Data

Alluxio Data Orchestration and Cache Acceleration in China Unicom: Use Cases and Performance Gains

This article presents Zhang Ce's detailed overview of Alluxio's deployment at China Unicom, covering cache acceleration, compute‑storage separation, mixed‑load workloads, and lightweight analysis, and demonstrates how these strategies dramatically improve performance, scalability, and cost efficiency for big data processing.

AlluxioCache AccelerationData Orchestration
0 likes · 19 min read
Alluxio Data Orchestration and Cache Acceleration in China Unicom: Use Cases and Performance Gains
Big Data Technology Architecture
Big Data Technology Architecture
Aug 23, 2022 · Big Data

Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates

The Apache Hudi 0.12.0 release introduces a native Presto connector, archive‑beyond‑savepoint capability, file‑system based locking, new deltastreamer termination strategies, expanded Spark and Flink support, numerous performance enhancements, and a series of configuration and API updates for better data‑lake management.

Apache HudiFlinkPresto
0 likes · 12 min read
Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates
ITPUB
ITPUB
Jul 23, 2022 · Information Security

How Bilibili Secured Hadoop: Ranger‑Based HDFS and Hive Access Control Deep Dive

This article details Bilibili's implementation of Apache Ranger for fine‑grained access control across Hadoop, HDFS, Hive, Spark, and Presto, covering architecture, API redesign, admin optimizations, gray‑release strategies, permission pre‑checks, data masking, and future plans for incremental policy loading.

HDFSPrestoSpark
0 likes · 16 min read
How Bilibili Secured Hadoop: Ranger‑Based HDFS and Hive Access Control Deep Dive
Bilibili Tech
Bilibili Tech
Jul 22, 2022 · Information Security

Design and Optimization of Ranger‑Based Access Control for HDFS and Hive in Bilibili's Data Platform

Bilibili’s data platform redesigns Ranger‑based access control by simplifying HDFS and Hive policy APIs, parallelizing policy loading, adding gray‑release and pre‑check mechanisms, integrating fine‑grained Hive authorization with data‑masking, extending support to Spark and Presto, and planning incremental loading, policy fusion, and a NameNode proxy to boost security and performance.

HDFSPrestoSpark
0 likes · 15 min read
Design and Optimization of Ranger‑Based Access Control for HDFS and Hive in Bilibili's Data Platform
dbaplus Community
dbaplus Community
May 12, 2022 · Big Data

How Bilibili Scaled Presto on Hadoop: Architecture, Optimizations, and Performance Gains

This article details Bilibili's end‑to‑end Presto on Hadoop architecture, covering the multi‑engine SQL stack, dispatcher routing, cluster scale, stability enhancements like coordinator HA and real‑time punish, query limits, Hive UDF compatibility, insert‑overwrite support, Alluxio caching, multi‑datacenter routing, query result caching, Raptorx local cache, JDK upgrades, dynamic filtering, and future roadmap, illustrating how these innovations boosted query throughput and reduced latency.

Big DataCluster ManagementDistributed Systems
0 likes · 32 min read
How Bilibili Scaled Presto on Hadoop: Architecture, Optimizations, and Performance Gains
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 26, 2022 · Big Data

ByteDance's Internal Presto OLAP Engine: Deployment, Performance Boosts, and Operational Practices

The article details ByteDance's large‑scale deployment of the Presto OLAP engine for ad‑hoc, BI, and near‑real‑time analytics, describing its architecture, multi‑coordinator high‑availability design, routing gateway, adaptive cancel, history server, materialized‑view support, Hudi connector integration, and how these innovations improve performance, stability, and operational efficiency.

Big DataHudi ConnectorMaterialized Views
0 likes · 11 min read
ByteDance's Internal Presto OLAP Engine: Deployment, Performance Boosts, and Operational Practices
Zuoyebang Tech Team
Zuoyebang Tech Team
Apr 13, 2022 · Big Data

How Delta Lake Transformed Our Offline Data Warehouse Performance

This article details how ZuoYeBang's engineering team migrated their Hive‑based offline data warehouse to Delta Lake, tackling latency, scalability, and query‑performance challenges through stream‑to‑batch processing, data‑lake architecture, and optimizations like DPP and Z‑ordering.

Big DataDelta LakePerformance Optimization
0 likes · 15 min read
How Delta Lake Transformed Our Offline Data Warehouse Performance
Architect
Architect
Apr 11, 2022 · Big Data

Design, Optimization, and Future Roadmap of Bilibili's Presto SQL‑on‑Hadoop Architecture

This article details Bilibili's end‑to‑end Presto‑based SQL‑on‑Hadoop architecture, covering overall system components, query routing, Presto feature set, extensive stability and availability enhancements, performance boosts through caching and multi‑datacenter deployment, and outlines future development plans.

HadoopKubernetesPerformance Optimization
0 likes · 28 min read
Design, Optimization, and Future Roadmap of Bilibili's Presto SQL‑on‑Hadoop Architecture
Bilibili Tech
Bilibili Tech
Apr 9, 2022 · Big Data

Bilibili Presto on Hadoop: Architecture, Scaling, and Performance Enhancements

Bilibili’s Presto on Hadoop combines a multi‑engine offline platform with Kubernetes‑managed YARN scheduling, Ranger security, and a custom dispatcher, scaling to over 400 nodes handling 160 k daily queries on 10 PB, while adding coordinator HA, resource‑group punishment, query limits, Alluxio caching, dynamic filtering, and numerous SQL‑level enhancements, with future auto‑scaling and materialized‑view automation.

Big DataHadoopPresto
0 likes · 30 min read
Bilibili Presto on Hadoop: Architecture, Scaling, and Performance Enhancements
Volcano Engine Developer Services
Volcano Engine Developer Services
Dec 29, 2021 · Big Data

Scaling Presto at ByteDance: Architecture, Performance & Stability

ByteDance’s internal Presto platform, supporting nearly one million daily queries across ad‑hoc, BI visualization, and near‑real‑time analytics, achieves high performance and stability through SparkSQL compatibility, multi‑Coordinator architecture, dynamic routing, adaptive query cancellation, History Server, materialized views, and a dedicated Hudi connector.

Distributed SQLHudi ConnectorMaterialized Views
0 likes · 11 min read
Scaling Presto at ByteDance: Architecture, Performance & Stability
DataFunSummit
DataFunSummit
Dec 18, 2021 · Big Data

Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP

The Fast OLAP Forum held on December 19 at DataFunCon gathers leading experts from Baidu, Tencent, JD, and FreeWheel to share cutting‑edge techniques in vectorized execution, cloud‑native ClickHouse, large‑scale OLAP architectures, and Presto optimizations, offering deep insights for practitioners dealing with massive real‑time data workloads.

Apache DorisBig DataOLAP
0 likes · 7 min read
Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP
Top Architect
Top Architect
Dec 13, 2021 · Big Data

Design and Implementation of BanYu's Big Data Access Control System

This article describes the evolution from an unsecured data warehouse to a comprehensive big‑data access control system at BanYu, detailing the background, data access methods, design goals, authentication and authorization mechanisms, policy configuration, integration with Metabase, and the overall workflow that balances security with efficiency.

Big DataLDAPPresto
0 likes · 15 min read
Design and Implementation of BanYu's Big Data Access Control System
Architecture Digest
Architecture Digest
Dec 11, 2021 · Big Data

Design and Implementation of BanYu's Big Data Permission System

This article describes the background, design goals, authentication and authorization mechanisms, system architecture, policy configuration, and Metabase integration of BanYu's big data permission system, highlighting how it balances security and efficiency across Hive, Presto, HDFS, and other components.

Apache RangerPrestoaccess control
0 likes · 16 min read
Design and Implementation of BanYu's Big Data Permission System
IT Architects Alliance
IT Architects Alliance
Dec 11, 2021 · Big Data

Design and Implementation of Banyu's Big Data Permission System

This article describes the background, design goals, authentication and authorization mechanisms, system architecture, policy configuration, and Metabase integration of Banyu's big data permission system, which secures Hive, Presto, HDFS and other data access components using Apache Ranger and LDAP.

Apache RangerBig DataLDAP
0 likes · 14 min read
Design and Implementation of Banyu's Big Data Permission System
21CTO
21CTO
Dec 9, 2021 · Big Data

Designing a Scalable Big Data Permission System: From Hive to Metabase

BanYu’s early data warehouse lacked any access controls, prompting the creation of a comprehensive big‑data permission system that integrates authentication and authorization across Hive, Presto, HDFS, and Metabase using LDAP, Ranger policies, workflow automation, and both synchronous and asynchronous policy initialization.

AuthorizationBig DataLDAP
0 likes · 16 min read
Designing a Scalable Big Data Permission System: From Hive to Metabase
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 8, 2021 · Big Data

Presto Overview, Architecture, and Query Optimization Techniques

This article introduces Presto, an open‑source MPP SQL engine, explains its coordinator‑worker architecture and connector model, and provides detailed storage, query, and join optimization strategies—including in‑memory parallelism, dynamic plan compilation, and practical SQL code examples—to achieve low‑latency, high‑performance analytics on big data.

Big DataPrestoquery optimization
0 likes · 7 min read
Presto Overview, Architecture, and Query Optimization Techniques
DataFunSummit
DataFunSummit
Oct 21, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing two coordinator high‑availability solutions, and explaining the cross‑cluster scheduling architecture that leverages idle Presto resources to improve overall big‑data processing efficiency.

Big DataCross-Cluster SchedulingPresto
0 likes · 16 min read
Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling
21CTO
21CTO
Oct 6, 2021 · Big Data

Building a Real-Time TB-Scale Bill Query System with Kafka, Kudu, and Presto

This article details the design and implementation of a real‑time, TB‑scale bill‑detail query platform that leverages Kafka for streaming, Debezium and Confluent Platform for change capture, Kudu for low‑latency storage, and Presto/Kylin for fast OLAP queries, while outlining deployment, integration, and future enhancements.

KafkaKuduPresto
0 likes · 19 min read
Building a Real-Time TB-Scale Bill Query System with Kafka, Kudu, and Presto
Architect
Architect
Oct 6, 2021 · Big Data

Design and Implementation of a Real-time and Offline Integrated Query System

This article details the requirements, architecture, and implementation of a real-time and offline integrated query system, covering data ingestion via Debezium and Confluent Platform, storage in Kudu and HDFS, query engines Presto and Kylin, and strategies for data synchronization, partitioning, and scaling.

Big DataDebeziumKafka
0 likes · 19 min read
Design and Implementation of a Real-time and Offline Integrated Query System
DataFunTalk
DataFunTalk
Sep 10, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing enhancements for coordinator high‑availability, and explaining a cross‑cluster scheduling strategy that leverages idle Presto resources to improve overall big‑data workload efficiency.

Big DataCross-Cluster SchedulingHA
0 likes · 16 min read
Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling
DataFunTalk
DataFunTalk
Aug 14, 2021 · Databases

Evolution of OLAP Engines at Lenovo Liancheng Zhida and DorisDB Adoption

The article chronicles Lenovo Liancheng Zhida’s three‑stage evolution of OLAP engines—from early SQL Server scripts, through a Hadoop‑based Presto solution, to the adoption of DorisDB—detailing architecture, tool comparisons, implementation practices, and the performance and operational benefits achieved.

AnalyticsBig DataDorisDB
0 likes · 12 min read
Evolution of OLAP Engines at Lenovo Liancheng Zhida and DorisDB Adoption
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 21, 2021 · Big Data

Comprehensive Guide to Apache Kylin: Background, Architecture, Installation, Optimization, and Real‑World Use Cases

This article provides an in‑depth overview of Apache Kylin, covering its history, mission, core MOLAP principles, technical architecture, step‑by‑step installation (Docker and Hadoop), performance tuning, advanced cube settings, and detailed case studies from major companies such as Baidu, Lianjia, and Didi.

Apache KylinCubeDocker
0 likes · 53 min read
Comprehensive Guide to Apache Kylin: Background, Architecture, Installation, Optimization, and Real‑World Use Cases
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 17, 2021 · Big Data

Comprehensive Guide to Presto: Origins, Architecture, Optimization, and Real‑World Applications

This article provides an in‑depth overview of Presto, covering its history, core principles, architectural components, query optimization techniques, resource management, tuning tips, data model, and case studies from companies like Didi and Youzan, offering practical guidance for deploying and operating the distributed SQL engine at scale.

PrestoQuery EngineResource Management
0 likes · 33 min read
Comprehensive Guide to Presto: Origins, Architecture, Optimization, and Real‑World Applications
dbaplus Community
dbaplus Community
May 27, 2021 · Big Data

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

This article details Vipshop's OLAP evolution, describing how Presto, Kylin, and ClickHouse are integrated, the deployment architecture with HAproxy and chproxy, containerization on Kubernetes, and the Flink‑ClickHouse pipeline that enables self‑service analysis of hundred‑billion‑row datasets while addressing performance challenges and future roadmap.

Big DataFlinkOLAP
0 likes · 28 min read
How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink
Qu Tech
Qu Tech
May 6, 2021 · Big Data

How JuiceFS Cut HDFS Load by 26% and Boost Presto Query Speed 13%

This case study details how integrating JuiceFS with Presto reduced HDFS cluster load by about 26%, achieved over 90% cache hit rate for ad‑hoc queries, and lowered average query latency by roughly 13%, while simplifying operations and improving system stability.

Big DataCacheHDFS
0 likes · 9 min read
How JuiceFS Cut HDFS Load by 26% and Boost Presto Query Speed 13%
DataFunTalk
DataFunTalk
Feb 16, 2021 · Big Data

Understanding Presto: Architecture, Query Execution, and Youzan’s Practical Experience

This article explains Presto’s core architecture and low‑latency query execution process, describes how Youzan adopts Presto for various data‑platform scenarios, discusses the evolution of its deployment, and outlines the performance challenges and future enhancements such as Alluxio integration and session property management.

Big DataPerformance OptimizationPresto
0 likes · 13 min read
Understanding Presto: Architecture, Query Execution, and Youzan’s Practical Experience
DataFunTalk
DataFunTalk
Jan 6, 2021 · Big Data

Didi's Presto Engine: Architecture, Optimizations, and Operational Practices

This article presents Didi's three‑year experience with Presto, detailing its architecture, low‑latency design, large‑scale deployment, extensive Hive compatibility work, resource isolation, Druid connector integration, usability enhancements, stability engineering, performance tuning, and future directions for the ad‑hoc query engine.

Big DataDistributed SystemsDruid Connector
0 likes · 17 min read
Didi's Presto Engine: Architecture, Optimizations, and Operational Practices
Liulishuo Tech Team
Liulishuo Tech Team
Dec 31, 2020 · Big Data

Migrating a Petabyte-Scale Big Data Platform to Alibaba Cloud: Architecture, Challenges, and Lessons Learned

This article details the end‑to‑end migration of a petabyte‑scale big‑data platform to Alibaba Cloud, describing the DSS synchronization system, its integration with Hive Metastore and Airflow, the gray‑release strategy, data‑consistency validation using Presto, and key takeaways for future cloud migrations.

Big Data MigrationDSSHive Metastore
0 likes · 10 min read
Migrating a Petabyte-Scale Big Data Platform to Alibaba Cloud: Architecture, Challenges, and Lessons Learned
Zhongtong Tech
Zhongtong Tech
Oct 30, 2020 · Big Data

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

This article details ZTO Express's journey of adopting Apache Kylin for OLAP, comparing it with Presto, describing platform architecture, performance gains, integration with scheduling and monitoring systems, and the practical optimizations and future plans that enabled sub‑second query responses on massive daily data volumes.

Apache KylinBig DataHBase
0 likes · 16 min read
How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 25, 2020 · Big Data

How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges

Alibaba Cloud’s Data Lake Analytics (DLA) tackles the growing complexity of data scenarios by offering cloud‑native, serverless solutions for data lake management, massive metadata construction, and high‑performance Spark and Presto engines, while addressing challenges such as high entry barriers, stability, and multi‑tenant isolation.

Cloud NativeData LakePresto
0 likes · 22 min read
How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges
ITPUB
ITPUB
Oct 10, 2020 · Big Data

How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations

Didi’s three‑year journey with Presto transformed it into the company’s primary ad‑hoc and Hive‑SQL acceleration engine, serving over 6 000 users, processing 2‑3 PB of HDFS data daily, and achieving major gains in stability, performance, cost, and usability through extensive architectural tweaks, resource isolation, connector extensions, and monitoring enhancements.

Big DataCluster ManagementDruid Connector
0 likes · 18 min read
How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations
Didi Tech
Didi Tech
Oct 9, 2020 · Big Data

Presto at Didi: Architecture, Optimizations, and Operational Experience

At Didi, Presto has been the default ad‑hoc and Hive‑SQL engine for over three years, serving 6,000 users, processing 2‑3 PB daily and 30‑35 trillion rows, with mixed and dedicated clusters, migration to PrestoSQL 340, extensive Hive compatibility, label‑based isolation, a native Druid connector, usability and stability enhancements, and JVM‑level performance optimizations, while planning further resource‑saving upgrades.

Big DataCluster ManagementDistributed SQL
0 likes · 17 min read
Presto at Didi: Architecture, Optimizations, and Operational Experience
Big Data Technology Architecture
Big Data Technology Architecture
May 31, 2020 · Big Data

Applying Apache Hudi in Medical Big Data: Architecture, Synchronization, Storage Choices, and Future Directions

This article examines the use of Apache Hudi for building a hospital‑wide medical big‑data platform, covering construction background, reasons for selecting Hudi, data synchronization methods, storage mode choices, query optimizations, and future development considerations.

Apache HudiCopy-on-WriteMedical Big Data
0 likes · 7 min read
Applying Apache Hudi in Medical Big Data: Architecture, Synchronization, Storage Choices, and Future Directions
Youzan Coder
Youzan Coder
Apr 1, 2020 · Big Data

Presto Implementation and Practice at YouZan: A Big Data Query Engine Journey

The article outlines Presto’s high‑performance, coordinator‑worker architecture and query flow, describes YouZan’s migration from mixed Hadoop deployment to dedicated low‑latency clusters, details challenges such as small‑file handling and regex backtracking with their fixes, and previews future enhancements like Alluxio integration, session property managers, and Ranger‑based multi‑tenant isolation.

FacebookHDFSPerformance Optimization
0 likes · 14 min read
Presto Implementation and Practice at YouZan: A Big Data Query Engine Journey
dbaplus Community
dbaplus Community
Jan 7, 2020 · Databases

Why ClickHouse Beats Presto for Real‑Time Metrics: A Deep Dive

This article examines the shortcomings of a Storm‑based real‑time metric platform, outlines the requirements for a stable, SQL‑driven, fast engine, and explains why ClickHouse was chosen over Presto, detailing performance benchmarks, architectural advantages, cluster configuration, engine options, best practices, and common operational issues.

PrestoReal-time analyticsclickhouse
0 likes · 18 min read
Why ClickHouse Beats Presto for Real‑Time Metrics: A Deep Dive
NetEase Game Operations Platform
NetEase Game Operations Platform
Dec 5, 2018 · Big Data

Presto + Alluxio Architecture for Interactive Ad‑hoc Queries in NetEase Game Data Warehouse

This article describes how NetEase Games built a Presto‑based interactive ad‑hoc query platform backed by Alluxio caching to achieve sub‑10‑second query latency, outlines the architectural design, performance comparisons with other Hadoop‑based solutions, encountered issues, and future improvement plans.

AlluxioBig DataPresto
0 likes · 10 min read
Presto + Alluxio Architecture for Interactive Ad‑hoc Queries in NetEase Game Data Warehouse
Ctrip Technology
Ctrip Technology
Jul 3, 2018 · Big Data

Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap

This article details Ctrip's experience with the Presto distributed SQL engine, outlining the initial performance and stability issues, the comprehensive enhancements made in security, resource control, compatibility, and monitoring, and the multi‑stage upgrade plan that guides its future evolution.

Big DataKerberosPerformance Optimization
0 likes · 11 min read
Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap
Ctrip Technology
Ctrip Technology
Aug 10, 2017 · Big Data

Design and Implementation of Ctrip's Large-Scale Data Platform

This article details the architectural choices, component selection, performance tuning, and team organization behind Ctrip's big‑data platform, covering Kafka, Presto, Elasticsearch, Gobblin, Zeppelin, REST APIs, and job scheduling to achieve scalable, interactive data analysis and visualization.

ETLElasticsearchPresto
0 likes · 18 min read
Design and Implementation of Ctrip's Large-Scale Data Platform
dbaplus Community
dbaplus Community
Jul 16, 2017 · Big Data

How Vipshop Scaled Real‑Time OLAP: From GreenPlum to Presto, Kylin, and Redis

Vipshop faced massive data growth that broke traditional RDBMS, causing slow OLAP queries, inefficient ETL, and long development cycles, so it iteratively rebuilt its analytics stack—adding Hadoop/Hive, a self‑service UI, Presto, Kylin, and Redis—to achieve sub‑second query responses, higher concurrency, and a flexible, low‑latency BI solution.

KylinOLAPPresto
0 likes · 23 min read
How Vipshop Scaled Real‑Time OLAP: From GreenPlum to Presto, Kylin, and Redis
Liulishuo Tech Team
Liulishuo Tech Team
Sep 24, 2016 · Backend Development

Developing Custom Presto SQL Functions (UDF) with Java Plugins

This tutorial explains how to create, register, and deploy custom scalar, aggregation, and window functions for the Presto distributed query engine using Java annotations, the Presto plugin mechanism, and code examples that illustrate UDF development, plugin packaging, and state handling for aggregation functions.

JavaPrestoUDF
0 likes · 11 min read
Developing Custom Presto SQL Functions (UDF) with Java Plugins
Liulishuo Tech Team
Liulishuo Tech Team
Jun 17, 2016 · Big Data

Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design

This article details the architectural design and implementation of a scalable big data platform built on AWS services, highlighting the transition from HDFS to S3 for storage, the use of EMR for elastic compute, and a custom Execution Service integrated with Consul and Airflow for automated cluster management and task scheduling.

AWS EMRAirflowBig Data Architecture
0 likes · 11 min read
Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design
21CTO
21CTO
Mar 31, 2016 · Big Data

Inside Airbnb’s Massive Big Data Platform: Architecture, Lessons & Scaling Secrets

Airbnb’s engineering team outlines the evolution of its big‑data platform, detailing the philosophy behind its architecture, the dual “gold” and “silver” Hive clusters, migration to Mesos, use of Presto, Airpal, Airflow, and the performance and cost gains achieved through these design choices.

AirbnbAirflowBig Data
0 likes · 11 min read
Inside Airbnb’s Massive Big Data Platform: Architecture, Lessons & Scaling Secrets