Jan 13, 2015 · Big Data

Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle

This article reviews Spark 1.2’s major enhancements—including the External Data Source API, column pruning, predicate pushdown, and in‑memory columnar storage—while also detailing Baidu’s large‑scale Spark deployments, its custom high‑performance Shuffle service, and the integration of Spark with the Tachyon memory file system.

BaiduBig DataExternal Data Source API

0 likes · 16 min read

Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle

In-Memory Columnar Storage

Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle