Cloud Native 3 min read

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native environments, revealing how cloud storage cost models affect I/O optimization, and presents Uber Presto case‑study findings that highlight fragmented access patterns and financial implications of storage API calls.

DataFunTalk

Aug 11, 2024

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

The paper explores the widespread industry shift of migrating data‑intensive analytics applications from on‑premises to cloud‑native environments, emphasizing that the unique cost model of cloud storage demands a more granular understanding of performance optimization.

Through an empirical study of Uber's production Presto workload, the authors demonstrate that traditional I/O optimizations, which ignore the financial cost of storage API calls, can lead to unexpectedly high expenses in cloud settings.

Key observations include highly fragmented data‑access patterns—over 50% of accesses are smaller than 10 KB and more than 90% are under 1 MB—indicating that the same access characteristics have very different cost and performance implications when the storage backend is a cloud service.

The whitepaper presents a case‑study‑driven set of I/O optimization strategies tailored for cloud environments, guiding readers on how to adjust design cognition and tactics based on varying cloud storage scenarios to improve cost‑performance ratios.

Core sections cover: (1) adapting perception and strategy according to cloud storage conditions and their impact on application design; (2) a detailed Uber‑based case study showing additional costs introduced by I/O techniques during cloud migration; and (3) a fresh perspective on system design in the cloud computing domain to help stakeholders address the rapid growth of data‑intensive applications.

Readers will gain a concrete I/O optimization framework that can serve as a starting point for further research and enable the design of efficient I/O strategies specifically for data‑intensive workloads operating in cloud‑native environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native I/O optimization presto data-intensive Cost Model

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.