Baidu Tech Salon
Jan 13, 2015 · Big Data
Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle
This article reviews Spark 1.2’s major enhancements—including the External Data Source API, column pruning, predicate pushdown, and in‑memory columnar storage—while also detailing Baidu’s large‑scale Spark deployments, its custom high‑performance Shuffle service, and the integration of Spark with the Tachyon memory file system.
BaiduBig DataExternal Data Source API
0 likes · 16 min read
