Tagged articles
1 articles
Page 1 of 1
Tencent Cloud Developer
Tencent Cloud Developer
Jul 24, 2019 · Big Data

Implementing Custom Data Sources in Spark: TGSpark Data Source V2 Practice

The article explains how Tencent’s TGSpark leverages Spark DataSource V2 to create a custom source for TGMars storage, detailing shard‑aware design, push‑down of columns and filters, columnar batch loading, partition‑location reporting, and experimental results that show reduced shuffles and improved local computation when executor placement matches storage nodes.

Big DataColumn PushdownCustom Data Source
0 likes · 10 min read
Implementing Custom Data Sources in Spark: TGSpark Data Source V2 Practice