Big Data Technology Architecture
Apr 8, 2021 · Big Data
Managing Small Files in Spark SQL: Causes, Impact, and Practical Solutions
This article explains the small‑file problem in Spark SQL on HDFS, its impact on NameNode memory and query performance, describes how dynamic partition inserts and shuffle settings generate many files, and presents practical solutions such as partition‑based distribution, random bucketing and adaptive query execution to control file count.
HadoopPerformanceSmall Files
0 likes · 12 min read