Tagged articles

presto

92 articles · Page 1 of 1

Mar 25, 2026 · Industry Insights

How Vivo Scaled Marketing Automation with Presto, Bitmap, and StarRocks

This case study details how Vivo’s marketing automation platform evolved its data‑driven architecture—from a Presto‑based wide‑table design, through a Bitmap optimization, to a StarRocks migration—addressing performance bottlenecks, reducing resource costs, and enhancing data security.

Big DataData ArchitectureOLAP

0 likes · 11 min read

How Vivo Scaled Marketing Automation with Presto, Bitmap, and StarRocks

Shopee Tech Team

Oct 25, 2024 · Big Data

StarRocks at Shopee: Practical Use Cases and Performance Analysis

Shopee’s deployment of StarRocks across DataService, DataGo, and DataStudio demonstrates that its vectorized engine, cost‑based optimizer, and materialized‑view caching can query Hive, Iceberg, Delta Lake and Hudi up to 20,000× faster than Presto, cutting CPU usage and delivering consistently lower latency for complex analytics.

Data LakeHiveMPP

0 likes · 11 min read

StarRocks at Shopee: Practical Use Cases and Performance Analysis

DataFunSummit

Aug 11, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage cost models affect performance optimization, and presents case‑study findings from Uber’s Presto production environment that highlight fragmented I/O patterns and the financial impact of storage API calls.

Cost ModelI/O optimizationcloud-native

0 likes · 3 min read

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

DataFunTalk

Aug 11, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native environments, revealing how cloud storage cost models affect I/O optimization, and presents Uber Presto case‑study findings that highlight fragmented access patterns and financial implications of storage API calls.

Cost ModelI/O optimizationcloud-native

0 likes · 3 min read

DataFunSummit

Aug 10, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper investigates the migration of data‑intensive analytics to cloud‑native environments, using Uber’s Presto workload to expose how cloud storage cost models and fragmented I/O patterns affect performance, and proposes optimized I/O strategies to improve cost‑effectiveness and system design.

Cloud NativeCost ModelI/O optimization

0 likes · 3 min read

DataFunTalk

Aug 9, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper analyzes the shift of data‑intensive analytics to cloud‑native platforms, examines Uber Presto’s fragmented I/O patterns, reveals hidden storage‑API cost impacts, and proposes cloud‑aware I/O optimization strategies to improve performance‑cost efficiency.

Case StudyI/O optimizationUber

0 likes · 3 min read

DataFunSummit

Aug 7, 2024 · Cloud Native

Optimizing I/O for Data-Intensive Analytics in Cloud-Native Environments: A Case Study of Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native environments, analyzing how cloud storage cost models affect performance optimization, and presents an Uber Presto case study that reveals fragmented I/O patterns and proposes cost‑effective optimization strategies.

Cloud NativeI/O optimizationcloud storage cost

0 likes · 3 min read

Optimizing I/O for Data-Intensive Analytics in Cloud-Native Environments: A Case Study of Uber Presto

DataFunTalk

Aug 2, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber’s Presto Deployment

This whitepaper examines the trend of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage cost models affect I/O optimization, and using Uber’s Presto production data to show that traditional I/O strategies overlook costly storage API calls, leading to high expenses.

Case StudyCost ModelI/O optimization

0 likes · 3 min read

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber’s Presto Deployment

DataFunTalk

Jul 28, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry shift of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage cost models affect I/O optimization and presenting Uber Presto case‑study findings that highlight fragmented access patterns and associated financial impacts.

Case StudyI/O optimizationpresto

0 likes · 3 min read

DataFunSummit

Jul 27, 2024 · Cloud Native

Migrating Data‑Intensive Analytics to Cloud‑Native Environments: Cost‑Aware I/O Optimization Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage’s unique cost model demands finer‑grained I/O optimization, illustrated through an empirical case study of Uber’s Presto production environment and its fragmented access patterns.

Case StudyCost ModelI/O optimization

0 likes · 3 min read

Migrating Data‑Intensive Analytics to Cloud‑Native Environments: Cost‑Aware I/O Optimization Insights from Uber Presto

DataFunSummit

Jul 26, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: A Case Study of Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, analyzing how cloud storage cost models affect performance optimization, and presents a case study of Uber’s Presto production environment that reveals fragmented I/O patterns and the financial impact of storage API calls.

Cost ModelI/O optimizationcloud-native

0 likes · 3 min read

DataFunTalk

Jul 26, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native environments, analyzes how cloud storage cost models affect performance optimization, and presents Uber Presto case‑study findings that reveal fragmented access patterns and hidden financial costs of traditional I/O strategies.

Case StudyCost ModelI/O optimization

0 likes · 3 min read

DataFunSummit

Jun 30, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics to cloud‑native platforms, revealing how cloud storage cost models affect I/O performance and presenting case‑study findings from Uber's Presto deployment to guide efficient I/O design in the cloud.

I/O optimizationcloud storage costcloud-native

0 likes · 3 min read

Past Memory Big Data

Jun 27, 2024 · Big Data

Inside Presto 2.0: The Native C++ Query Engine Explained

This article provides a detailed technical overview of Presto 2.0, the native C++ query engine built on the Velox library, covering its motivation, vectorized architecture, memory management, performance benchmarks from Meta and IBM, and deployment practices for large‑scale data warehouses.

Big DataC#Data Warehouse

0 likes · 15 min read

Inside Presto 2.0: The Native C++ Query Engine Explained

DataFunSummit

Jun 22, 2024 · Cloud Native

Optimizing I/O for Data-Intensive Analytics in Cloud-Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of migrating data‑intensive analytics workloads to cloud‑native environments, revealing how cloud storage’s unique cost model demands finer‑grained performance optimization, and presents Uber Presto case‑study findings that expose fragmented I/O patterns and associated financial impacts.

Cloud NativeCost ModelI/O optimization

0 likes · 3 min read

DataFunTalk

Jun 9, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, analyzes how cloud storage cost models affect performance optimization, and presents Uber Presto case‑study findings that reveal fragmented access patterns and new I/O strategies to improve cost‑effectiveness.

Case StudyCost ModelI/O optimization

0 likes · 3 min read

Past Memory Big Data

Jun 6, 2024 · Operations

How Uber Tuned GC to Boost Presto Cluster Stability

Uber runs over 20 Presto clusters serving more than 500,000 daily queries, but frequent full GCs and OOMs threatened stability; by analyzing G1GC behavior and adjusting IHOP, heap waste, free space, and young‑gen size on JDK 8 and JDK 11, they cut full GC occurrences by up to 80% and markedly improved overall reliability.

Cluster stabilityG1GCJDK11

0 likes · 13 min read

How Uber Tuned GC to Boost Presto Cluster Stability

DataFunSummit

Jun 6, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, analyzes the unique cost model of cloud storage, and presents case‑study findings from Uber's Presto production environment to guide efficient I/O design and cost‑effective performance optimization.

Cost ModelI/O optimizationPerformance

0 likes · 3 min read

DataFunTalk

May 31, 2024 · Cloud Native

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics applications to cloud‑native environments, revealing how cloud storage cost models affect performance optimization, and presents case‑study findings from Uber’s Presto production workload that highlight fragmented I/O patterns and the financial impact of storage API calls.

I/O optimizationcloud-nativecost modeling

0 likes · 3 min read

DataFunTalk

Mar 3, 2024 · Big Data

Alluxio Local Cache for Presto on S3: Architecture, Implementation, and Performance Evaluation at NewsBreak

This article presents NewsBreak's practical deployment of Alluxio Local Cache with Presto on S3, detailing the system architecture, cache design considerations, implementation steps, performance metrics, and future optimization directions to reduce query latency and storage costs.

AlluxioBig DataCache

0 likes · 12 min read

Alluxio Local Cache for Presto on S3: Architecture, Implementation, and Performance Evaluation at NewsBreak

政采云技术

Jan 11, 2024 · Big Data

Overview of the Government Procurement Cloud Self-Service Data Extraction Platform

This article introduces the self‑service data extraction platform developed by the Government Procurement Cloud, detailing its architecture, core modules such as self‑service extraction, data push, resource management, operation audit, permission controls, performance optimizations, and future development plans.

Big DataData SecurityHive

0 likes · 9 min read

Overview of the Government Procurement Cloud Self-Service Data Extraction Platform

Past Memory Big Data

Dec 6, 2023 · Big Data

A Year with Prestissimo: How Meta Leveraged Velox for Presto Vectorization

The article summarizes a PrestoCon talk that reviews Meta's year‑long production experience with Prestissimo—a C++ Presto worker built on the Velox execution engine—highlighting its architecture, integration design, performance gains, and lessons for anyone considering Velox‑based vectorization.

C#MetaVectorized Execution

0 likes · 2 min read

A Year with Prestissimo: How Meta Leveraged Velox for Presto Vectorization

DataFunTalk

Sep 9, 2023 · Big Data

Presto + Tencent DOP (Alluxio) Architecture and Optimization Practices for Financial OLAP

This article presents the practical implementation of Presto combined with Tencent DOP (Alluxio) in a financial OLAP scenario, detailing background and architectural evolution, the Presto‑Alluxio design, optimization techniques for caching, storage scalability, ORC handling, and performance results, followed by conclusions and future directions.

AlluxioBig DataOLAP

0 likes · 15 min read

Presto + Tencent DOP (Alluxio) Architecture and Optimization Practices for Financial OLAP

ByteDance Data Platform

May 29, 2023 · Databases

Which Open‑Source OLAP Engine Wins the TPC‑DS Benchmark? A Deep Performance Comparison

Using the TPC‑DS benchmark’s 99 queries on a 1 TB dataset, this study evaluates the performance of four open‑source OLAP engines—ClickHouse, Doris, Presto, and ByConity—across basic, join, aggregation, subquery, and window‑function scenarios, revealing ByConity’s superior speed and the limitations of ClickHouse.

ByConityClickHouseDoris

0 likes · 12 min read

Which Open‑Source OLAP Engine Wins the TPC‑DS Benchmark? A Deep Performance Comparison

DataFunSummit

Mar 20, 2023 · Backend Development

Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions

This article introduces a unified User‑Defined Function (UDF) solution on a cloud data platform, detailing its remote execution architecture, compatibility with Hive UDFs, resource isolation, hot‑update capabilities, internal platform implementation, open‑source contributions to PrestoDB, and future work plans.

HiveOpen-sourceServerless

0 likes · 11 min read

Unified UDF Implementation on Cloud Platform: Architecture, Features, and Open‑Source Contributions

StarRing Big Data Open Lab

Mar 17, 2023 · Big Data

How Data Federation Transforms Enterprise Data Integration and Analytics

This article explains the concept of data federation, its advantages over traditional ETL, key architectural components, practical use cases such as virtual ODS, data staging, warehouse extension, heterogeneous migration, and compares Presto and Trino as distributed query engines for unified, secure, and low‑cost data access.

Distributed QueryETL alternativeTrino

0 likes · 21 min read

How Data Federation Transforms Enterprise Data Integration and Analytics

DataFunTalk

Feb 24, 2023 · Big Data

Presto and Alluxio Integration for Iceberg: Architecture, Best Practices, and Future Work

This article explains how Presto and Alluxio work together to query Iceberg tables, describes their architectures, deployment options, best‑practice recommendations such as using Iceberg native catalogs and local caches, and outlines future research directions for improving CPU usage and off‑heap caching.

AlluxioBig DataCache

0 likes · 14 min read

Presto and Alluxio Integration for Iceberg: Architecture, Best Practices, and Future Work

DataFunTalk

Feb 17, 2023 · Big Data

Tencent Alluxio (DOP) Deployment and Optimization in Financial Data Analytics

This article describes how Tencent's Alluxio-based Data Orchestration Platform (DOP) was applied to financial analytics, detailing the business background, challenges of large‑scale OLAP workloads, the Alluxio architecture and usage modes, performance results, and the series of optimizations and tuning performed to achieve significant speedups.

AlluxioBig DataData Orchestration

0 likes · 15 min read

Tencent Alluxio (DOP) Deployment and Optimization in Financial Data Analytics

DataFunTalk

Feb 12, 2023 · Big Data

Optimizing Bilibili Presto Cluster Query Performance with Alluxio and Local Cache

This article presents a comprehensive technical overview of Bilibili's Presto cluster architecture, the challenges of query performance on Hadoop, and the systematic optimizations—including Alluxio integration, local cache mechanisms, multi‑active coordinators, label‑based scheduling, and real‑time penalties—that together improve availability, stability, and latency for large‑scale analytics workloads.

AlluxioBig DataCache

0 likes · 23 min read

Optimizing Bilibili Presto Cluster Query Performance with Alluxio and Local Cache

Big Data Technology & Architecture

Feb 6, 2023 · Big Data

Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations

This article presents an in‑depth overview of real‑time data‑warehouse scenarios, discusses challenges such as timeliness, update efficiency, and resource consumption, and details practical solutions using Apache Hudi, Flink, Presto, and related optimizations for ingestion, indexing, compaction, and query performance.

Big DataData LakeFlink

0 likes · 17 min read

Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations

dbaplus Community

Jan 31, 2023 · Big Data

Building ByteDance’s Real‑Time Data Warehouse with Hudi: Architecture & Solutions

This article explains how ByteDance designed and deployed a real‑time data warehouse on a data lake using Hudi, detailing three business scenarios, the challenges of latency, consistency and resource usage, and the engineering solutions—including upserts, compaction services, indexing, and future unified storage plans.

Data LakeFlinkHudi

0 likes · 14 min read

Building ByteDance’s Real‑Time Data Warehouse with Hudi: Architecture & Solutions

DataFunSummit

Jan 1, 2023 · Big Data

Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans

The Shopee Data Infra talk details the current storage architecture, Presto‑based acceleration with Alluxio caching, service‑oriented storage solutions using Alluxio Fuse and S3 APIs, and outlines future enhancements for Spark/Hive integration and CSI/Fuse optimizations, providing a comprehensive view of large‑scale big data storage engineering.

AlluxioCache ManagerData Infrastructure

0 likes · 16 min read

Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans

政采云技术

Dec 20, 2022 · Big Data

An Introduction to Presto: Origins, Features, Architecture, and Quick‑Start Deployment Guide

This article explains Presto’s origin as Facebook’s open‑source OLAP engine, outlines its key characteristics, advantages and drawbacks, describes its overall architecture and query flow, and provides a step‑by‑step guide for downloading, configuring, and launching a Presto cluster for fast interactive analytics.

ConnectorDeploymentSQL

0 likes · 16 min read

An Introduction to Presto: Origins, Features, Architecture, and Quick‑Start Deployment Guide

DataFunTalk

Nov 19, 2022 · Big Data

Improving Bilibili Offline Cluster Performance with Presto and Alluxio

This technical presentation explains how Bilibili reduced database pressure and query latency in its production environment by integrating Presto with Alluxio, detailing the offline cluster architecture, challenges of compute‑storage separation, caching strategies, consistency mechanisms, performance gains, and future work.

AlluxioCachepresto

0 likes · 17 min read

Improving Bilibili Offline Cluster Performance with Presto and Alluxio

Past Memory Big Data

Nov 15, 2022 · Big Data

How Uber Accelerated Presto Queries with Alluxio Local Cache

Uber processes over 500,000 daily Presto queries across 20 clusters handling more than 50 PB of data, and by deploying Alluxio Local Cache on NVMe disks they raised cache‑hit rates from roughly 65% to over 90% while addressing real‑time partition updates, node churn, and cache‑size constraints.

AlluxioBig DataConsistent Hashing

0 likes · 15 min read

How Uber Accelerated Presto Queries with Alluxio Local Cache

vivo Internet Technology

Oct 26, 2022 · Big Data

Cardinality Counting in Presto: Algorithms, Implementation, and Best Practices

The article explains cardinality counting in Presto, comparing exact set‑based methods with memory‑efficient bitmap, Linear Count, and HyperLogLog approximations, detailing their algorithms, implementation in Presto’s query engine, and offering best‑practice recommendations for choosing the appropriate technique in business workloads.

HyperLogLogSQLbitmap

0 likes · 16 min read

Cardinality Counting in Presto: Algorithms, Implementation, and Best Practices

Past Memory Big Data

Oct 13, 2022 · Big Data

Step-by-Step Guide: Integrating Presto with Velox on macOS (Build, Configure, and Run)

This article walks through the performance bottleneck of CPU in data analytics, introduces the Velox vectorized execution engine, and provides a detailed, zero‑to‑one tutorial for downloading Presto source, syncing Velox, fixing build paths, compiling both Java and C++ components, configuring CLion and IntelliJ, launching the servers, and executing SQL queries while noting stability concerns.

JavaSQLVelox

0 likes · 19 min read

Step-by-Step Guide: Integrating Presto with Velox on macOS (Build, Configure, and Run)

DataFunSummit

Oct 3, 2022 · Big Data

Optimizing Point‑Query Performance in Presto with Apache Hudi Data Skipping and Layout Techniques

This article explains how Huawei Cloud leverages Apache Hudi and HetuEngine (Presto) to improve point‑query performance on Lakehouse architectures through data layout optimization, file‑skipping techniques, metadata tables, and extensive benchmark results demonstrating multi‑fold speedups.

Apache HudiBig DataData Skipping

0 likes · 11 min read

Optimizing Point‑Query Performance in Presto with Apache Hudi Data Skipping and Layout Techniques

DataFunSummit

Sep 30, 2022 · Big Data

MercsDB: Architecture, Storage, Computation, and Optimization of Tencent's MPP Data Warehouse Engine

The article presents a comprehensive technical overview of MercsDB—formerly HermesDB—including its background, storage and indexing designs, native and Presto computation engines, vectorization optimizations, benchmark results, real‑world applications, and future development plans.

Big DataColumnar StorageMPP

0 likes · 20 min read

MercsDB: Architecture, Storage, Computation, and Optimization of Tencent's MPP Data Warehouse Engine

Past Memory Big Data

Sep 13, 2022 · Databases

Velox: An Open‑Source Unified Execution Engine for Data Systems

Velox is Meta's open‑source unified execution engine that consolidates common data‑intensive components, integrates with engines like Presto, Spark, and TorchArrow, and delivers up to ten‑fold speedups on CPU‑bound queries while simplifying development and fostering a reusable, community‑driven ecosystem.

Data ManagementPerformanceSpark

0 likes · 9 min read

Velox: An Open‑Source Unified Execution Engine for Data Systems

DataFunSummit

Sep 5, 2022 · Big Data

DataFun Summit 2022 – Modern Data Stack Forum: Speaker Lineup and Session Overviews

The DataFun Summit 2022 featured a Data Lake & Warehouse forum with expert talks on PALO, ByteDance LAS, Iceberg at Huawei, and Presto‑Alluxio acceleration, providing detailed technical outlines, speaker backgrounds, and audience takeaways for modern big‑data architectures.

Apache IcebergBig DataData Lake

0 likes · 7 min read

DataFun Summit 2022 – Modern Data Stack Forum: Speaker Lineup and Session Overviews

DataFunTalk

Aug 31, 2022 · Big Data

Alluxio Data Orchestration and Cache Acceleration in China Unicom: Use Cases and Performance Gains

This article presents Zhang Ce's detailed overview of Alluxio's deployment at China Unicom, covering cache acceleration, compute‑storage separation, mixed‑load workloads, and lightweight analysis, and demonstrates how these strategies dramatically improve performance, scalability, and cost efficiency for big data processing.

AlluxioCache AccelerationData Orchestration

0 likes · 19 min read

Alluxio Data Orchestration and Cache Acceleration in China Unicom: Use Cases and Performance Gains

Big Data Technology Architecture

Aug 23, 2022 · Big Data

Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates

The Apache Hudi 0.12.0 release introduces a native Presto connector, archive‑beyond‑savepoint capability, file‑system based locking, new deltastreamer termination strategies, expanded Spark and Flink support, numerous performance enhancements, and a series of configuration and API updates for better data‑lake management.

Apache HudiFlinkSpark

0 likes · 12 min read

Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates

ITPUB

Jul 23, 2022 · Information Security

How Bilibili Secured Hadoop: Ranger‑Based HDFS and Hive Access Control Deep Dive

This article details Bilibili's implementation of Apache Ranger for fine‑grained access control across Hadoop, HDFS, Hive, Spark, and Presto, covering architecture, API redesign, admin optimizations, gray‑release strategies, permission pre‑checks, data masking, and future plans for incremental policy loading.

Access ControlData SecurityHDFS

0 likes · 16 min read

How Bilibili Secured Hadoop: Ranger‑Based HDFS and Hive Access Control Deep Dive

Bilibili Tech

Jul 22, 2022 · Information Security

Design and Optimization of Ranger‑Based Access Control for HDFS and Hive in Bilibili's Data Platform

Bilibili’s data platform redesigns Ranger‑based access control by simplifying HDFS and Hive policy APIs, parallelizing policy loading, adding gray‑release and pre‑check mechanisms, integrating fine‑grained Hive authorization with data‑masking, extending support to Spark and Presto, and planning incremental loading, policy fusion, and a NameNode proxy to boost security and performance.

Access ControlHDFSHive

0 likes · 15 min read

Design and Optimization of Ranger‑Based Access Control for HDFS and Hive in Bilibili's Data Platform

Alibaba Cloud Big Data AI Platform

Jul 21, 2022 · Big Data

Boosting Offline Data Warehouse Performance with DeltaLake: Key Strategies

This article details how Zuoyebang migrated its Hive‑based offline data warehouse to DeltaLake, addressing latency, incremental updates, and query performance through stream‑to‑batch processing, dynamic partition pruning, and Z‑order optimization, resulting in faster data readiness and analyst queries.

Big DataDeltaLakeHive

0 likes · 17 min read

Boosting Offline Data Warehouse Performance with DeltaLake: Key Strategies

Architect

May 17, 2022 · Big Data

Design and Architecture of an Integrated BI Platform Using Apache Kylin for Large‑Scale OLAP

The article explains the challenges of big‑data analytics, introduces pre‑computation OLAP concepts, and details how Apache Kylin together with Spark, Flink, Presto and other components can be integrated into a BI platform to achieve near‑real‑time query performance on massive datasets.

Apache KylinBIData Warehouse

0 likes · 11 min read

Design and Architecture of an Integrated BI Platform Using Apache Kylin for Large‑Scale OLAP

dbaplus Community

May 12, 2022 · Big Data

How Bilibili Scaled Presto on Hadoop: Architecture, Optimizations, and Performance Gains

This article details Bilibili's end‑to‑end Presto on Hadoop architecture, covering the multi‑engine SQL stack, dispatcher routing, cluster scale, stability enhancements like coordinator HA and real‑time punish, query limits, Hive UDF compatibility, insert‑overwrite support, Alluxio caching, multi‑datacenter routing, query result caching, Raptorx local cache, JDK upgrades, dynamic filtering, and future roadmap, illustrating how these innovations boosted query throughput and reduced latency.

Big DataHadoopPerformance Optimization

0 likes · 32 min read

How Bilibili Scaled Presto on Hadoop: Architecture, Optimizations, and Performance Gains

Big Data Technology & Architecture

Apr 26, 2022 · Big Data

ByteDance's Internal Presto OLAP Engine: Deployment, Performance Boosts, and Operational Practices

The article details ByteDance's large‑scale deployment of the Presto OLAP engine for ad‑hoc, BI, and near‑real‑time analytics, describing its architecture, multi‑coordinator high‑availability design, routing gateway, adaptive cancel, history server, materialized‑view support, Hudi connector integration, and how these innovations improve performance, stability, and operational efficiency.

Big DataHigh AvailabilityHudi Connector

0 likes · 11 min read

ByteDance's Internal Presto OLAP Engine: Deployment, Performance Boosts, and Operational Practices

Zuoyebang Tech Team

Apr 13, 2022 · Big Data

How Delta Lake Transformed Our Offline Data Warehouse Performance

This article details how ZuoYeBang's engineering team migrated their Hive‑based offline data warehouse to Delta Lake, tackling latency, scalability, and query‑performance challenges through stream‑to‑batch processing, data‑lake architecture, and optimizations like DPP and Z‑ordering.

Big DataDelta LakeHive

0 likes · 15 min read

How Delta Lake Transformed Our Offline Data Warehouse Performance

Architect

Apr 11, 2022 · Big Data

Design, Optimization, and Future Roadmap of Bilibili's Presto SQL‑on‑Hadoop Architecture

This article details Bilibili's end‑to‑end Presto‑based SQL‑on‑Hadoop architecture, covering overall system components, query routing, Presto feature set, extensive stability and availability enhancements, performance boosts through caching and multi‑datacenter deployment, and outlines future development plans.

HadoopKubernetesPerformance Optimization

0 likes · 28 min read

Design, Optimization, and Future Roadmap of Bilibili's Presto SQL‑on‑Hadoop Architecture

Bilibili Tech

Apr 9, 2022 · Big Data

Bilibili Presto on Hadoop: Architecture, Scaling, and Performance Enhancements

Bilibili’s Presto on Hadoop combines a multi‑engine offline platform with Kubernetes‑managed YARN scheduling, Ranger security, and a custom dispatcher, scaling to over 400 nodes handling 160 k daily queries on 10 PB, while adding coordinator HA, resource‑group punishment, query limits, Alluxio caching, dynamic filtering, and numerous SQL‑level enhancements, with future auto‑scaling and materialized‑view automation.

Big DataHadoopSQL

0 likes · 30 min read

Bilibili Presto on Hadoop: Architecture, Scaling, and Performance Enhancements

Big Data Technology & Architecture

Feb 28, 2022 · Big Data

Integrating Apache Hudi with Hive, Presto, and Spark SQL: Installation, Operations, and Query Examples

This article provides a step‑by‑step guide on integrating Apache Hudi with Hive and Presto, demonstrates core Hudi operations such as insert, upsert, delete, query, and Hive synchronization using Scala code, and shows how to manage Hudi tables through Spark SQL DDL/DML commands.

Apache HudiBig DataData Lake

0 likes · 16 min read

Integrating Apache Hudi with Hive, Presto, and Spark SQL: Installation, Operations, and Query Examples

IT Architects Alliance

Jan 24, 2022 · Big Data

How to Build a Scalable Big Data Access Control System with Hive, Presto, and Ranger

This article details the design and implementation of a comprehensive big data permission system that integrates Hive, Presto, Hadoop, and Metabase, covering data access methods, authentication choices, Ranger-based authorization, policy management, and automated workflow integration to balance security and efficiency.

Access ControlApache RangerBig Data

0 likes · 16 min read

How to Build a Scalable Big Data Access Control System with Hive, Presto, and Ranger

Volcano Engine Developer Services

Dec 29, 2021 · Big Data

Scaling Presto at ByteDance: Architecture, Performance & Stability

ByteDance’s internal Presto platform, supporting nearly one million daily queries across ad‑hoc, BI visualization, and near‑real‑time analytics, achieves high performance and stability through SparkSQL compatibility, multi‑Coordinator architecture, dynamic routing, adaptive query cancellation, History Server, materialized views, and a dedicated Hudi connector.

Distributed SQLHudi ConnectorMaterialized Views

0 likes · 11 min read

Scaling Presto at ByteDance: Architecture, Performance & Stability

DataFunSummit

Dec 18, 2021 · Big Data

Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP

The Fast OLAP Forum held on December 19 at DataFunCon gathers leading experts from Baidu, Tencent, JD, and FreeWheel to share cutting‑edge techniques in vectorized execution, cloud‑native ClickHouse, large‑scale OLAP architectures, and Presto optimizations, offering deep insights for practitioners dealing with massive real‑time data workloads.

Apache DorisBig DataClickHouse

0 likes · 7 min read

Fast OLAP Forum – Latest Practices and Innovations in Real‑Time OLAP

Top Architect

Dec 13, 2021 · Big Data

Design and Implementation of BanYu's Big Data Access Control System

This article describes the evolution from an unsecured data warehouse to a comprehensive big‑data access control system at BanYu, detailing the background, data access methods, design goals, authentication and authorization mechanisms, policy configuration, integration with Metabase, and the overall workflow that balances security with efficiency.

Access ControlBig DataHive

0 likes · 15 min read

Design and Implementation of BanYu's Big Data Access Control System

Architecture Digest

Dec 11, 2021 · Big Data

Design and Implementation of BanYu's Big Data Permission System

This article describes the background, design goals, authentication and authorization mechanisms, system architecture, policy configuration, and Metabase integration of BanYu's big data permission system, highlighting how it balances security and efficiency across Hive, Presto, HDFS, and other components.

Access ControlApache RangerData Security

0 likes · 16 min read

Design and Implementation of BanYu's Big Data Permission System

IT Architects Alliance

Dec 11, 2021 · Big Data

Design and Implementation of Banyu's Big Data Permission System

This article describes the background, design goals, authentication and authorization mechanisms, system architecture, policy configuration, and Metabase integration of Banyu's big data permission system, which secures Hive, Presto, HDFS and other data access components using Apache Ranger and LDAP.

Access ControlApache RangerBig Data

0 likes · 14 min read

21CTO

Dec 9, 2021 · Big Data

Designing a Scalable Big Data Permission System: From Hive to Metabase

BanYu’s early data warehouse lacked any access controls, prompting the creation of a comprehensive big‑data permission system that integrates authentication and authorization across Hive, Presto, HDFS, and Metabase using LDAP, Ranger policies, workflow automation, and both synchronous and asynchronous policy initialization.

AuthorizationBig DataData Security

0 likes · 16 min read

Designing a Scalable Big Data Permission System: From Hive to Metabase

Big Data Technology & Architecture

Dec 8, 2021 · Big Data

Presto Overview, Architecture, and Query Optimization Techniques

This article introduces Presto, an open‑source MPP SQL engine, explains its coordinator‑worker architecture and connector model, and provides detailed storage, query, and join optimization strategies—including in‑memory parallelism, dynamic plan compilation, and practical SQL code examples—to achieve low‑latency, high‑performance analytics on big data.

Big DataQuery OptimizationSQL

0 likes · 7 min read

Presto Overview, Architecture, and Query Optimization Techniques

DataFunTalk

Nov 6, 2021 · Big Data

Evolution and Practices of OLAP at Vipshop: Presto, ClickHouse, and Kylin

This article details Vipshop's OLAP evolution, covering the deployment, optimization, and containerization of Presto, ClickHouse, and Kylin, the challenges faced, self‑developed tooling, and future directions for intelligent scaling and resource management.

Big DataClickHouseFlink

0 likes · 27 min read

Evolution and Practices of OLAP at Vipshop: Presto, ClickHouse, and Kylin

DataFunSummit

Oct 21, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing two coordinator high‑availability solutions, and explaining the cross‑cluster scheduling architecture that leverages idle Presto resources to improve overall big‑data processing efficiency.

Big DataCloud ComputingCross-Cluster Scheduling

0 likes · 16 min read

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

21CTO

Oct 6, 2021 · Big Data

Building a Real-Time TB-Scale Bill Query System with Kafka, Kudu, and Presto

This article details the design and implementation of a real‑time, TB‑scale bill‑detail query platform that leverages Kafka for streaming, Debezium and Confluent Platform for change capture, Kudu for low‑latency storage, and Presto/Kylin for fast OLAP queries, while outlining deployment, integration, and future enhancements.

KafkaKuduReal-time Data

0 likes · 19 min read

Building a Real-Time TB-Scale Bill Query System with Kafka, Kudu, and Presto

Architect

Oct 6, 2021 · Big Data

Design and Implementation of a Real-time and Offline Integrated Query System

This article details the requirements, architecture, and implementation of a real-time and offline integrated query system, covering data ingestion via Debezium and Confluent Platform, storage in Kudu and HDFS, query engines Presto and Kylin, and strategies for data synchronization, partitioning, and scaling.

Big DataData WarehouseDebezium

0 likes · 19 min read

Design and Implementation of a Real-time and Offline Integrated Query System

DataFunTalk

Sep 10, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing enhancements for coordinator high‑availability, and explaining a cross‑cluster scheduling strategy that leverages idle Presto resources to improve overall big‑data workload efficiency.

Big DataCross-Cluster SchedulingData Engineering

0 likes · 16 min read

DataFunTalk

Aug 14, 2021 · Databases

Evolution of OLAP Engines at Lenovo Liancheng Zhida and DorisDB Adoption

The article chronicles Lenovo Liancheng Zhida’s three‑stage evolution of OLAP engines—from early SQL Server scripts, through a Hadoop‑based Presto solution, to the adoption of DorisDB—detailing architecture, tool comparisons, implementation practices, and the performance and operational benefits achieved.

AnalyticsBig DataDorisDB

0 likes · 12 min read

Evolution of OLAP Engines at Lenovo Liancheng Zhida and DorisDB Adoption

Big Data Technology & Architecture

Jun 21, 2021 · Big Data

Comprehensive Guide to Apache Kylin: Background, Architecture, Installation, Optimization, and Real‑World Use Cases

This article provides an in‑depth overview of Apache Kylin, covering its history, mission, core MOLAP principles, technical architecture, step‑by‑step installation (Docker and Hadoop), performance tuning, advanced cube settings, and detailed case studies from major companies such as Baidu, Lianjia, and Didi.

Apache KylinCubeDocker

0 likes · 53 min read

Comprehensive Guide to Apache Kylin: Background, Architecture, Installation, Optimization, and Real‑World Use Cases

Big Data Technology & Architecture

Jun 17, 2021 · Big Data

Comprehensive Guide to Presto: Origins, Architecture, Optimization, and Real‑World Applications

This article provides an in‑depth overview of Presto, covering its history, core principles, architectural components, query optimization techniques, resource management, tuning tips, data model, and case studies from companies like Didi and Youzan, offering practical guidance for deploying and operating the distributed SQL engine at scale.

OptimizationQuery EngineResource Management

0 likes · 33 min read

Comprehensive Guide to Presto: Origins, Architecture, Optimization, and Real‑World Applications

dbaplus Community

May 27, 2021 · Big Data

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

This article details Vipshop's OLAP evolution, describing how Presto, Kylin, and ClickHouse are integrated, the deployment architecture with HAproxy and chproxy, containerization on Kubernetes, and the Flink‑ClickHouse pipeline that enables self‑service analysis of hundred‑billion‑row datasets while addressing performance challenges and future roadmap.

Big DataClickHouseData Warehouse

0 likes · 28 min read

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

Qu Tech

May 6, 2021 · Big Data

How JuiceFS Cut HDFS Load by 26% and Boost Presto Query Speed 13%

This case study details how integrating JuiceFS with Presto reduced HDFS cluster load by about 26%, achieved over 90% cache hit rate for ad‑hoc queries, and lowered average query latency by roughly 13%, while simplifying operations and improving system stability.

Big DataCacheHDFS

0 likes · 9 min read

How JuiceFS Cut HDFS Load by 26% and Boost Presto Query Speed 13%

DataFunTalk

Feb 16, 2021 · Big Data

Understanding Presto: Architecture, Query Execution, and Youzan’s Practical Experience

This article explains Presto’s core architecture and low‑latency query execution process, describes how Youzan adopts Presto for various data‑platform scenarios, discusses the evolution of its deployment, and outlines the performance challenges and future enhancements such as Alluxio integration and session property management.

Big DataPerformance OptimizationSQL

0 likes · 13 min read

Understanding Presto: Architecture, Query Execution, and Youzan’s Practical Experience

DataFunTalk

Jan 6, 2021 · Big Data

Didi's Presto Engine: Architecture, Optimizations, and Operational Practices

This article presents Didi's three‑year experience with Presto, detailing its architecture, low‑latency design, large‑scale deployment, extensive Hive compatibility work, resource isolation, Druid connector integration, usability enhancements, stability engineering, performance tuning, and future directions for the ad‑hoc query engine.

Big DataDruid ConnectorPerformance Optimization

0 likes · 17 min read

Didi's Presto Engine: Architecture, Optimizations, and Operational Practices

Liulishuo Tech Team

Dec 31, 2020 · Big Data

Migrating a Petabyte-Scale Big Data Platform to Alibaba Cloud: Architecture, Challenges, and Lessons Learned

This article details the end‑to‑end migration of a petabyte‑scale big‑data platform to Alibaba Cloud, describing the DSS synchronization system, its integration with Hive Metastore and Airflow, the gray‑release strategy, data‑consistency validation using Presto, and key takeaways for future cloud migrations.

Big Data MigrationCloud MigrationDSS

0 likes · 10 min read

Migrating a Petabyte-Scale Big Data Platform to Alibaba Cloud: Architecture, Challenges, and Lessons Learned

Zhongtong Tech

Oct 30, 2020 · Big Data

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

This article details ZTO Express's journey of adopting Apache Kylin for OLAP, comparing it with Presto, describing platform architecture, performance gains, integration with scheduling and monitoring systems, and the practical optimizations and future plans that enabled sub‑second query responses on massive daily data volumes.

Apache KylinBig DataHBase

0 likes · 16 min read

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

Alibaba Cloud Developer

Oct 25, 2020 · Big Data

How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges

Alibaba Cloud’s Data Lake Analytics (DLA) tackles the growing complexity of data scenarios by offering cloud‑native, serverless solutions for data lake management, massive metadata construction, and high‑performance Spark and Presto engines, while addressing challenges such as high entry barriers, stability, and multi‑tenant isolation.

Cloud NativeData LakeServerless Spark

0 likes · 22 min read

How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges

ITPUB

Oct 10, 2020 · Big Data

How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations

Didi’s three‑year journey with Presto transformed it into the company’s primary ad‑hoc and Hive‑SQL acceleration engine, serving over 6 000 users, processing 2‑3 PB of HDFS data daily, and achieving major gains in stability, performance, cost, and usability through extensive architectural tweaks, resource isolation, connector extensions, and monitoring enhancements.

Big DataDruid ConnectorHive Compatibility

0 likes · 18 min read

How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations

Didi Tech

Oct 9, 2020 · Big Data

Presto at Didi: Architecture, Optimizations, and Operational Experience

At Didi, Presto has been the default ad‑hoc and Hive‑SQL engine for over three years, serving 6,000 users, processing 2‑3 PB daily and 30‑35 trillion rows, with mixed and dedicated clusters, migration to PrestoSQL 340, extensive Hive compatibility, label‑based isolation, a native Druid connector, usability and stability enhancements, and JVM‑level performance optimizations, while planning further resource‑saving upgrades.

Big DataDistributed SQLDruid Connector

0 likes · 17 min read

Presto at Didi: Architecture, Optimizations, and Operational Experience

Architects Research Society

Aug 6, 2020 · Big Data

Differences Between Spark SQL and Presto: A Comparative Overview

This article compares Spark SQL and Presto, explaining their architectures, key differences, performance characteristics, supported connectors, installation requirements, and typical use cases, while providing head‑to‑head tables and examples of federated queries.

ComparisonSQL EnginesSpark SQL

0 likes · 10 min read

Differences Between Spark SQL and Presto: A Comparative Overview

Huawei Cloud Developer Alliance

Jun 3, 2020 · Big Data

How to Connect Python to Presto on Huawei MRS: Step-by-Step Guide & Common Pitfalls

Learn how to set up a Python environment on an Ubuntu ECS, install the presto‑python‑client and PyHive libraries, configure Kerberos and SSL credentials, run sample queries against a Presto coordinator, and avoid typical errors such as NTP, SSL and authentication issues.

Big DataKerberosPyHive

0 likes · 6 min read

How to Connect Python to Presto on Huawei MRS: Step-by-Step Guide & Common Pitfalls

Big Data Technology Architecture

May 31, 2020 · Big Data

Applying Apache Hudi in Medical Big Data: Architecture, Synchronization, Storage Choices, and Future Directions

This article examines the use of Apache Hudi for building a hospital‑wide medical big‑data platform, covering construction background, reasons for selecting Hudi, data synchronization methods, storage mode choices, query optimizations, and future development considerations.

Apache HudiCopy-on-WriteData synchronization

0 likes · 7 min read

Applying Apache Hudi in Medical Big Data: Architecture, Synchronization, Storage Choices, and Future Directions

Youzan Coder

Apr 1, 2020 · Big Data

Presto Implementation and Practice at YouZan: A Big Data Query Engine Journey

The article outlines Presto’s high‑performance, coordinator‑worker architecture and query flow, describes YouZan’s migration from mixed Hadoop deployment to dedicated low‑latency clusters, details challenges such as small‑file handling and regex backtracking with their fixes, and previews future enhancements like Alluxio integration, session property managers, and Ranger‑based multi‑tenant isolation.

Distributed ComputingFacebookHDFS

0 likes · 14 min read

Presto Implementation and Practice at YouZan: A Big Data Query Engine Journey

dbaplus Community

Jan 7, 2020 · Databases

Why ClickHouse Beats Presto for Real‑Time Metrics: A Deep Dive

This article examines the shortcomings of a Storm‑based real‑time metric platform, outlines the requirements for a stable, SQL‑driven, fast engine, and explains why ClickHouse was chosen over Presto, detailing performance benchmarks, architectural advantages, cluster configuration, engine options, best practices, and common operational issues.

ClickHousePerformance Tuningmerge-tree

0 likes · 18 min read

Why ClickHouse Beats Presto for Real‑Time Metrics: A Deep Dive

NetEase Game Operations Platform

Dec 5, 2018 · Big Data

Presto + Alluxio Architecture for Interactive Ad‑hoc Queries in NetEase Game Data Warehouse

This article describes how NetEase Games built a Presto‑based interactive ad‑hoc query platform backed by Alluxio caching to achieve sub‑10‑second query latency, outlines the architectural design, performance comparisons with other Hadoop‑based solutions, encountered issues, and future improvement plans.

AlluxioBig DataData Warehouse

0 likes · 10 min read

Presto + Alluxio Architecture for Interactive Ad‑hoc Queries in NetEase Game Data Warehouse

Ctrip Technology

Jul 3, 2018 · Big Data

Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap

This article details Ctrip's experience with the Presto distributed SQL engine, outlining the initial performance and stability issues, the comprehensive enhancements made in security, resource control, compatibility, and monitoring, and the multi‑stage upgrade plan that guides its future evolution.

Big DataKerberosMonitoring

0 likes · 11 min read

Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap

Ctrip Technology

Aug 10, 2017 · Big Data

Design and Implementation of Ctrip's Large-Scale Data Platform

This article details the architectural choices, component selection, performance tuning, and team organization behind Ctrip's big‑data platform, covering Kafka, Presto, Elasticsearch, Gobblin, Zeppelin, REST APIs, and job scheduling to achieve scalable, interactive data analysis and visualization.

ETLElasticsearchpresto

0 likes · 18 min read

Design and Implementation of Ctrip's Large-Scale Data Platform

dbaplus Community

Aug 3, 2017 · Big Data

How Ctrip Built a Scalable Data Platform with Presto, Elasticsearch, and Gobblin

This article summarizes Xu Peng's DAMS 2017 presentation on selecting big‑data platform components, designing ETL pipelines, choosing analysis engines, optimizing Elasticsearch, and building a data‑driven team at Ctrip.

Big Data ArchitectureCluster TuningData Platform

0 likes · 23 min read

How Ctrip Built a Scalable Data Platform with Presto, Elasticsearch, and Gobblin

dbaplus Community

Jul 16, 2017 · Big Data

How Vipshop Scaled Real‑Time OLAP: From GreenPlum to Presto, Kylin, and Redis

Vipshop faced massive data growth that broke traditional RDBMS, causing slow OLAP queries, inefficient ETL, and long development cycles, so it iteratively rebuilt its analytics stack—adding Hadoop/Hive, a self‑service UI, Presto, Kylin, and Redis—to achieve sub‑second query responses, higher concurrency, and a flexible, low‑latency BI solution.

Data WarehouseKylinOLAP

0 likes · 23 min read

How Vipshop Scaled Real‑Time OLAP: From GreenPlum to Presto, Kylin, and Redis

Liulishuo Tech Team

Sep 24, 2016 · Backend Development

Developing Custom Presto SQL Functions (UDF) with Java Plugins

This tutorial explains how to create, register, and deploy custom scalar, aggregation, and window functions for the Presto distributed query engine using Java annotations, the Presto plugin mechanism, and code examples that illustrate UDF development, plugin packaging, and state handling for aggregation functions.

AggregationJavaPlugin

0 likes · 11 min read

Developing Custom Presto SQL Functions (UDF) with Java Plugins

Liulishuo Tech Team

Jun 17, 2016 · Big Data

Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design

This article details the architectural design and implementation of a scalable big data platform built on AWS services, highlighting the transition from HDFS to S3 for storage, the use of EMR for elastic compute, and a custom Execution Service integrated with Consul and Airflow for automated cluster management and task scheduling.

AWS EMRAirflowBig Data Architecture

0 likes · 11 min read

Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design

21CTO

Mar 31, 2016 · Big Data

Inside Airbnb’s Massive Big Data Platform: Architecture, Lessons & Scaling Secrets

Airbnb’s engineering team outlines the evolution of its big‑data platform, detailing the philosophy behind its architecture, the dual “gold” and “silver” Hive clusters, migration to Mesos, use of Presto, Airpal, Airflow, and the performance and cost gains achieved through these design choices.

AirbnbAirflowBig Data

0 likes · 11 min read

Inside Airbnb’s Massive Big Data Platform: Architecture, Lessons & Scaling Secrets

Art of Distributed System Architecture Design

Mar 31, 2016 · Big Data

Airbnb’s Big Data Platform Architecture: Design, Evolution, and Lessons Learned

Airbnb’s engineering team outlines the evolution and design of its massive big‑data platform—detailing the dual “gold” and “silver” Hive clusters, use of Kafka, Presto, Airflow, Mesos, and Spark, along with performance gains, cost reductions, and open‑source contributions.

AirbnbAirflowBig Data

0 likes · 13 min read

Airbnb’s Big Data Platform Architecture: Design, Evolution, and Lessons Learned