Tagged articles
2 articles
Page 1 of 1
Baidu Tech Salon
Baidu Tech Salon
Feb 28, 2024 · Big Data

Design, Optimization, and Practice of Baidu's Fusion Compute Engine for Data Warehouse

Baidu’s Fusion Compute Engine, built on Spark with a one‑layer wide‑table model, combines data‑skipping, push‑down, code‑generation, vectorization and extensive tuning to cut ad‑hoc query latency to seconds, shrink storage by ~30 %, and accelerate ETL workloads while maintaining stability for massive data‑warehouse workloads.

BaiduBig DataFusion Compute Engine
0 likes · 10 min read
Design, Optimization, and Practice of Baidu's Fusion Compute Engine for Data Warehouse
Baidu Geek Talk
Baidu Geek Talk
Feb 28, 2024 · Big Data

How Baidu’s Fusion Compute Engine Cuts Query Time to Seconds on Petabyte‑Scale Data

This article analyzes Baidu's fusion compute engine for its data warehouse, detailing its architecture, optimization techniques such as data skipping, Parquet column indexing, ProjectLimit and CodeGen, and demonstrates how these innovations reduce query latency to seconds while cutting storage costs by about 30% on multi‑petabyte workloads.

BaiduBig DataData Warehouse
0 likes · 12 min read
How Baidu’s Fusion Compute Engine Cuts Query Time to Seconds on Petabyte‑Scale Data