Tagged articles

Apache Parquet

2 articles · Page 1 of 1
Past Memory Big Data
Past Memory Big Data
Dec 27, 2024 · Big Data

How Uber Cuts Storage Costs with ZSTD Compression in Apache Parquet

Uber’s data lake on Hadoop stores hundreds of petabytes in Parquet files and, by adopting ZSTD compression, column pruning, and column reordering, achieves up to 79% storage reduction and significant vCore savings, with detailed benchmarks guiding optimal compression levels and open‑source contributions.

Apache ParquetBig DataHadoop
0 likes · 14 min read
How Uber Cuts Storage Costs with ZSTD Compression in Apache Parquet
dbaplus Community
dbaplus Community
May 26, 2016 · Big Data

Mastering Apache Parquet: Columnar Storage, Nested Data, and Performance Gains

This article explains Apache Parquet’s columnar storage format, its support for nested data models, the underlying striping/assembly algorithm, file structure, push‑down optimizations, and performance advantages within the Hadoop ecosystem, providing a comprehensive guide for big‑data practitioners.

Apache ParquetBig DataHadoop
0 likes · 22 min read
Mastering Apache Parquet: Columnar Storage, Nested Data, and Performance Gains