Tagged articles
5 articles
Page 1 of 1
Big Data Tech Team
Big Data Tech Team
Dec 28, 2025 · Big Data

When to Use Hive Partitioning vs Bucketing: A Practical Guide

This article explains Hive's partitioning and bucketing techniques, compares their purposes, advantages, and pitfalls, and shows how to combine them with concrete SQL examples to improve query performance, reduce I/O, and optimize joins and sampling in large data warehouses.

BucketingData WarehouseHive
0 likes · 7 min read
When to Use Hive Partitioning vs Bucketing: A Practical Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 31, 2024 · Big Data

Eliminating Shuffle in Spark Joins with Storage Partitioned Join (SPJ) for Iceberg Tables

This article explains how Spark ≥ 3.3 introduces Storage Partitioned Join (SPJ) to avoid costly shuffle operations when joining partitioned V2 source tables such as Apache Iceberg, detailing the required conditions, configuration settings, practical code examples, and various join scenarios including mismatched partitions and data skew.

BucketingData SkewSQL
0 likes · 15 min read
Eliminating Shuffle in Spark Joins with Storage Partitioned Join (SPJ) for Iceberg Tables
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Mar 3, 2017 · Big Data

Boost ETL Performance: Key Tips for Resources, Partitioning & Monitoring

Effective ETL optimization is crucial for data warehouse construction, and this guide outlines three core strategies—ensuring proper resource configuration, leveraging data characteristics for optimal partitioning and bucketing, and monitoring task execution—providing practical principles, pitfalls, and case studies to maximize ETL efficiency.

BucketingETLPartitioning
0 likes · 11 min read
Boost ETL Performance: Key Tips for Resources, Partitioning & Monitoring