Tag

Celeborn

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Feb 20, 2025 · Big Data

From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms

This article analyzes the transition from a tightly coupled storage‑compute architecture to a decoupled model, detailing how Kubernetes, Kyuubi, Celeborn, Blaze, and Hue together solve resource inefficiencies, improve scalability, and boost query performance in modern big‑data environments.

Big DataBlazeCeleborn
0 likes · 16 min read
From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms
DataFunTalk
DataFunTalk
Jun 22, 2024 · Big Data

Migrating Spark Shuffle Service from ESS to RSS (Celeborn) at Zhihu: Design, Implementation, and Benefits

This article details Zhihu's migration of massive Spark and MapReduce shuffle workloads from the External Shuffle Service (ESS) to a push‑based Remote Shuffle Service (RSS) powered by Celeborn, covering background problems, evaluation of open‑source implementations, deployment architecture, encountered issues, solutions, performance gains, and future plans.

Big DataCelebornPerformance
0 likes · 19 min read
Migrating Spark Shuffle Service from ESS to RSS (Celeborn) at Zhihu: Design, Implementation, and Benefits
DataFunTalk
DataFunTalk
Dec 31, 2023 · Big Data

Apache Celeborn (Incubating): Addressing Traditional Shuffle Limitations in Big Data Processing

Apache Celeborn (Incubating) is a remote shuffle service designed to overcome the inefficiencies, high storage demands, network overhead, and limited fault tolerance of traditional Spark shuffle implementations by introducing push‑shuffle, partition splitting, columnar shuffle, multi‑layer storage, and elastic, stable, and scalable architectures.

Apache SparkBig DataCeleborn
0 likes · 15 min read
Apache Celeborn (Incubating): Addressing Traditional Shuffle Limitations in Big Data Processing
DataFunSummit
DataFunSummit
Nov 25, 2023 · Big Data

Practical Experience with Apache Kyuubi and Celeborn on the DXY Big Data Platform

This article presents a comprehensive technical overview of how DXY's big data platform leverages Apache Kyuubi and Celeborn to unify Spark entry points, configure flexible task isolation, implement fine‑grained AuthZ, optimize small files and Z‑Order sorting, and accelerate large result set transmission with Arrow, while also discussing operational challenges and upcoming features.

ARROWApache KyuubiBig Data
0 likes · 17 min read
Practical Experience with Apache Kyuubi and Celeborn on the DXY Big Data Platform
DataFunTalk
DataFunTalk
Aug 5, 2023 · Big Data

Apache Celeborn (Incubating): Design, Performance, Stability, and Elasticity of a Remote Shuffle Service

This article reviews the limitations of traditional Spark shuffle, introduces Apache Celeborn (Incubating) as a remote shuffle service, and details its design for performance, stability, and elasticity, including push shuffle, partition splitting, columnar shuffle, multi‑layer storage, congestion control, and real‑world evaluation.

Apache SparkBig DataCeleborn
0 likes · 19 min read
Apache Celeborn (Incubating): Design, Performance, Stability, and Elasticity of a Remote Shuffle Service