Tag

Hive SQL

0 views collected around this technical thread.

Baidu Geek Talk
Baidu Geek Talk
Nov 18, 2024 · Big Data

Optimizing Multi-Dimensional User Count Statistics in Big Data Computing: A Data Tagging Approach

By replacing exponential row expansion with a data‑tagging strategy that encodes dimension combinations and aggregates at the user level, the authors cut Baidu Feed’s multi‑dimensional user‑count computation time from 49 to 14 minutes and shuffle size from 16 TB to 800 GB, enabling scalable analysis across dozens of dimensions for billions of daily users.

Big Data OptimizationHive SQLPerformance Tuning
0 likes · 12 min read
Optimizing Multi-Dimensional User Count Statistics in Big Data Computing: A Data Tagging Approach