Cloud Native 3 min read

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native environments, analyzes the unique cost model of cloud storage, and presents case‑study findings from Uber’s Presto production system that reveal fragmented I/O patterns and propose optimization strategies to improve cost‑performance in the cloud.

DataFunTalk

Aug 10, 2024

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper explores the widespread industry shift of migrating data‑intensive analytical applications from on‑premises to cloud‑native environments, emphasizing that the distinctive cost model of cloud storage demands a more nuanced understanding of performance optimization.

Using observations from Uber’s production deployment of Presto, the authors demonstrate that traditional I/O optimizations often overlook the financial cost of storage API calls, which can lead to unexpectedly high expenses in cloud settings.

The empirical study reveals that Presto’s data‑access patterns at Uber are highly fragmented—over 50% of accesses are smaller than 10 KB and more than 90% are under 1 MB—highlighting a stark contrast with conventional data‑platform environments and underscoring the need for cloud‑specific I/O strategies.

Presented as a case‑study, the paper provides a logical framework and concrete strategies for I/O optimization, aiming to help readers design efficient I/O solutions tailored to cloud environments and significantly improve cost‑performance for data‑intensive workloads.

✓ Adjust cognition and strategies according to different cloud‑storage scenarios and understand their impact on application design and performance.

✓ Examine the additional costs that widely‑used I/O optimization techniques may incur during enterprise‑level cloud migrations, illustrated with Uber data.

✓ Offer a fresh perspective on system design in the cloud‑computing domain to assist stakeholders in addressing the rapid growth of data‑intensive applications.

The whitepaper serves as a starting point for further research, enabling readers to craft specialized I/O strategies that enhance efficiency for data‑intensive applications operating in cloud environments.

For the full content, click “Read the original article” to access the complete whitepaper.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

I/O optimization Cost Model

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.