Cloud Native 3 min read

Optimizing I/O for Data-Intensive Applications in Cloud-Native Environments: Insights from Uber Presto

This whitepaper examines the industry shift of moving data‑intensive analytics to cloud‑native platforms, revealing how cloud storage’s unique cost model demands nuanced I/O optimization, and presents Uber Presto case‑study findings that highlight fragmented access patterns and cost‑effective design strategies for high‑performance cloud workloads.

DataFunTalk
DataFunTalk
DataFunTalk
Optimizing I/O for Data-Intensive Applications in Cloud-Native Environments: Insights from Uber Presto

This whitepaper explores the growing industry trend of migrating data‑intensive analytics workloads from on‑premises to cloud‑native environments, emphasizing that the distinct cost model of cloud storage requires a deeper understanding of performance optimization.

Through an empirical study of Uber’s production Presto cluster, the authors reveal that more than 50 % of data accesses are smaller than 10 KB and over 90 % are under 1 MB, indicating a highly fragmented access pattern that incurs significant storage API call costs in the cloud.

Based on these observations, the paper proposes a set of I/O‑optimization principles and strategies—such as adjusting data layout, leveraging cloud‑native storage features, and aligning application design with storage pricing—to achieve cost‑effective, high‑performance processing of data‑intensive applications.

Readers will gain a concrete case‑study framework for designing efficient I/O solutions tailored to cloud environments, providing a foundation for further research and practical implementation.

cloud-nativebig dataI/O optimizationprestodata-intensivecost model
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.