Tag

Big Data Optimization

0 views collected around this technical thread.

Baidu Geek Talk
Baidu Geek Talk
Nov 18, 2024 · Big Data

Optimizing Multi-Dimensional User Count Statistics in Big Data Computing: A Data Tagging Approach

By replacing exponential row expansion with a data‑tagging strategy that encodes dimension combinations and aggregates at the user level, the authors cut Baidu Feed’s multi‑dimensional user‑count computation time from 49 to 14 minutes and shuffle size from 16 TB to 800 GB, enabling scalable analysis across dozens of dimensions for billions of daily users.

Big Data OptimizationHive SQLPerformance Tuning
0 likes · 12 min read
Optimizing Multi-Dimensional User Count Statistics in Big Data Computing: A Data Tagging Approach
DataFunSummit
DataFunSummit
Sep 27, 2022 · Big Data

Apache Spark Adaptive Query Execution and Kyuubi Optimization Practices for Data Warehousing

This article presents a detailed overview of Apache Spark's Adaptive Query Execution evolution, its optimization techniques, and performance gains, followed by an in‑depth discussion of Apache Kyuubi's architecture, security integrations, cloud‑native capabilities, and practical Rebalance + Z‑Order strategies that enhance data‑warehouse task efficiency and query performance.

Adaptive Query ExecutionApache SparkBig Data Optimization
0 likes · 19 min read
Apache Spark Adaptive Query Execution and Kyuubi Optimization Practices for Data Warehousing