Big Data 16 min read

From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms

This article analyzes the transition from a tightly coupled storage‑compute architecture to a decoupled model, detailing how Kubernetes, Kyuubi, Celeborn, Blaze, and Hue together solve resource inefficiencies, improve scalability, and boost query performance in modern big‑data environments.

DataFunTalk

Feb 20, 2025

From Integrated Storage‑Compute to Decoupled Architecture: Practical Exploration of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue in Big Data Platforms

In the era of rapidly evolving big data technologies, the shift from an integrated storage‑compute architecture to a decoupled model offers new opportunities for enterprises.

Initially, the monolithic storage‑compute design simplified data movement for small workloads, but as data volumes exploded, its tight coupling caused resource inflexibility and scaling challenges.

The article outlines the pain points (node failures affecting both storage and compute, resource waste between Yarn and Impala, long Hive job runtimes) and proposes a set of questions to guide the redesign.

Adopting Kubernetes provides the foundation for storage‑compute separation, offering fine‑grained scheduling, automatic scaling, and a rich ecosystem. Namespaces isolate tenants, and resource limits ensure fair sharing.

Kyuubi serves as a multi‑tenant SQL gateway on Spark, replacing Hive‑MR and delivering 6‑10× speedups for large queries.

Celeborn acts as an external shuffle service, enabling dynamic allocation on Kubernetes and improving Spark performance by up to three times.

Blaze, a native vectorized execution engine, accelerates Spark SQL by 20‑30% with minimal configuration.

Hue provides a user‑friendly SQL editor; deploying it on Kubernetes with a single pod simplifies access to the unified query layer.

Configuration snippets illustrate how to enable Celeborn shuffle manager, Blaze, and Kyuubi in Spark, as well as a Helm‑based deployment of Hue.

# Celeborn shuffle manager configuration (when not using Blaze)
spark.shuffle.manager=org.apache.spark.shuffle.celeborn.SparkShuffleManager
# Blaze shuffle manager configuration
spark.shuffle.manager=org.apache.spark.sql.execution.blaze.shuffle.celeborn.BlazeCelebornShuffleManager
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.celeborn.master.endpoints=xxxxx:9097
spark.shuffle.service.enabled=false
spark.celeborn.client.spark.shuffle.writer=hash
spark.celeborn.client.push.replicate.enabled=false
spark.sql.adaptive.localShuffleReader.enabled=false
spark.sql.adaptive.skewJoin.enabled=true
spark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO
spark.dynamicAllocation.shuffleTracking.enabled=false
spark.celeborn.quota.identity.provider=org.apache.celeborn.common.identity.HadoopBasedIdentityProvider

Performance tests show average query latency reductions of around 25 % and significant resource utilization gains.

In summary, the combined use of Kubernetes, Kyuubi, Celeborn, Blaze, and Hue creates a cost‑effective, scalable, and high‑performance big‑data platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data kubernetes Spark Storage Compute Separation Kyuubi Blaze celeborn

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.