Tagged articles
1 articles
Page 1 of 1
Baidu Tech Salon
Baidu Tech Salon
Jan 13, 2015 · Big Data

Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle

This article reviews Spark 1.2’s major enhancements—including the External Data Source API, column pruning, predicate pushdown, and in‑memory columnar storage—while also detailing Baidu’s large‑scale Spark deployments, its custom high‑performance Shuffle service, and the integration of Spark with the Tachyon memory file system.

BaiduBig DataExternal Data Source API
0 likes · 16 min read
Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle