Tagged articles

celeborn

8 articles · Page 1 of 1

Dec 10, 2025 · Big Data

Vivo’s 800‑Day Journey Optimizing Celeborn Remote Shuffle Service at PB Scale

This technical report details how Vivo’s big‑data platform adopted Celeborn as its remote shuffle service, evaluated alternatives, tuned hardware and software configurations, implemented performance and stability enhancements, and outlines future operational and community‑driven improvements for handling petabyte‑scale shuffle workloads.

Big DataRemote Shuffle ServiceSpark

0 likes · 20 min read

Vivo’s 800‑Day Journey Optimizing Celeborn Remote Shuffle Service at PB Scale

DataFunTalk

Feb 20, 2025 · Big Data

From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms

This article analyzes the transition from a tightly coupled storage‑compute architecture to a decoupled model, detailing how Kubernetes, Kyuubi, Celeborn, Blaze, and Hue together solve resource inefficiencies, improve scalability, and boost query performance in modern big‑data environments.

Big DataBlazeKyuubi

0 likes · 16 min read

From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms

dbaplus Community

Sep 4, 2024 · Big Data

How Ctrip Scaled Its Data Platform to Multi‑IDC Architecture with Spark 3, Kyuubi, and Celeborn

This article details how Ctrip’s data platform evolved from a single‑IDC design to a multi‑IDC, tiered storage and scheduling architecture, covering the challenges of rapid data growth, the migration to Spark 3 via Kyuubi, the introduction of Celeborn shuffle service, and the resulting performance and reliability gains.

Big DataDistributed storageHDFS

0 likes · 23 min read

How Ctrip Scaled Its Data Platform to Multi‑IDC Architecture with Spark 3, Kyuubi, and Celeborn

DataFunTalk

Jun 22, 2024 · Big Data

Migrating Spark Shuffle Service from ESS to RSS (Celeborn) at Zhihu: Design, Implementation, and Benefits

This article details Zhihu's migration of massive Spark and MapReduce shuffle workloads from the External Shuffle Service (ESS) to a push‑based Remote Shuffle Service (RSS) powered by Celeborn, covering background problems, evaluation of open‑source implementations, deployment architecture, encountered issues, solutions, performance gains, and future plans.

Big DataRSSShuffle

0 likes · 19 min read

Migrating Spark Shuffle Service from ESS to RSS (Celeborn) at Zhihu: Design, Implementation, and Benefits

DataFunTalk

Dec 31, 2023 · Big Data

Apache Celeborn (Incubating): Addressing Traditional Shuffle Limitations in Big Data Processing

Apache Celeborn (Incubating) is a remote shuffle service designed to overcome the inefficiencies, high storage demands, network overhead, and limited fault tolerance of traditional Spark shuffle implementations by introducing push‑shuffle, partition splitting, columnar shuffle, multi‑layer storage, and elastic, stable, and scalable architectures.

Apache SparkBig DataPerformance Optimization

0 likes · 15 min read

Apache Celeborn (Incubating): Addressing Traditional Shuffle Limitations in Big Data Processing

Zhongtong Tech

Dec 14, 2023 · Big Data

How Celeborn Transformed Spark Shuffle Performance at ZTO Express

Facing massive daily Spark shuffle volumes and unstable ETL performance, ZTO Express migrated from the community External Shuffle Service to Celeborn's Remote Shuffle Service, achieving higher disk I/O efficiency, better reliability, reduced network connections, and significant reductions in task failures and job latency.

Big DataRemote Shuffle ServiceShuffle

0 likes · 15 min read

How Celeborn Transformed Spark Shuffle Performance at ZTO Express

DataFunSummit

Nov 25, 2023 · Big Data

Practical Experience with Apache Kyuubi and Celeborn on the DXY Big Data Platform

This article presents a comprehensive technical overview of how DXY's big data platform leverages Apache Kyuubi and Celeborn to unify Spark entry points, configure flexible task isolation, implement fine‑grained AuthZ, optimize small files and Z‑Order sorting, and accelerate large result set transmission with Arrow, while also discussing operational challenges and upcoming features.

Apache KyuubiArrowBig Data

0 likes · 17 min read

Practical Experience with Apache Kyuubi and Celeborn on the DXY Big Data Platform

DataFunTalk

Aug 5, 2023 · Big Data

Apache Celeborn (Incubating): Design, Performance, Stability, and Elasticity of a Remote Shuffle Service

This article reviews the limitations of traditional Spark shuffle, introduces Apache Celeborn (Incubating) as a remote shuffle service, and details its design for performance, stability, and elasticity, including push shuffle, partition splitting, columnar shuffle, multi‑layer storage, congestion control, and real‑world evaluation.

Apache SparkBig DataShuffle Service

0 likes · 19 min read

Apache Celeborn (Incubating): Design, Performance, Stability, and Elasticity of a Remote Shuffle Service