How Cloud‑Native Transforms Big Data Platforms: Challenges, Solutions, and Future Trends
This article analyzes the rise of cloud‑native technologies in big data ecosystems, identifies key pain points such as resource scheduling, service capabilities, performance, and operations, and presents detailed technical explorations—including Volcano batch scheduling, Kyuubi serverless, vectorized computing, remote shuffle services, and storage‑compute separation—while outlining future development directions.
Background
Since 2020 major open‑source projects such as Spark, Kafka and Flink have added native Kubernetes support, driving a large‑scale reconstruction of the big‑data stack toward cloud‑native architectures. Cloud‑native brings elasticity, loose coupling and observability, which are essential for AI, 5G, edge computing and other emerging workloads.
Key technical challenges in cloud‑native big‑data platforms
Insufficient resource scheduling: The default Kubernetes scheduler cannot satisfy the needs of high‑performance engines (e.g., Spark, Presto). Problems include driver deadlocks, lack of queue, fair‑share and reservation scheduling.
Limited service capabilities: Traditional Spark Thrift Server provides only single‑tenant access and lacks fine‑grained permission control and serverless characteristics.
Performance overhead: Container networking and loss of data locality increase latency.
Shuffle bottleneck: Pods often have insufficient local disks, causing shuffle data loss or job failure.
Weak operational management: No unified component management, monitoring or log‑collection framework.
Technical solutions and implementation details
1. Large‑scale batch scheduling with Volcano
Volcano is integrated as the default batch scheduler for Spark on Kubernetes (Apache Spark 3.3+). It provides gang‑scheduling, fair‑share, queue, preemption, resource reservation and backfill. Example Spark submit command:
spark-submit \
--master k8s://https://$K8S_API_SERVER \
--deploy-mode cluster \
--conf spark.kubernetes.scheduler=volcano \
--conf spark.kubernetes.namespace=bigdata \
--conf spark.kubernetes.container.image=my-spark-image:latest \
--conf spark.scheduler.minRegisteredResourcesRatio=0.8 \
my_app.jarIntegration enables multi‑engine batch scheduling (Spark, Flink, TensorFlow, etc.) with queue and fair scheduling, improving cluster resource utilization by 30‑40%.
2. Serverless gateway based on Kyuubi
Kyuubi is extended to support a serverless SQL gateway that unifies JDBC/SQL access for Spark SQL, FlinkSQL, Hudi, Trino and other engines. Users interact via standard JDBC without managing clusters. Key configuration snippets:
# Enable Flink session mode on Kubernetes
kyuubi.session.engine=flink
kyuubi.session.engine.type=remote
kyuubi.session.flink.kubernetes.cluster.mode=client
kyuubi.session.flink.kubernetes.namespace=bigdata
kyuubi.session.flink.kubernetes.image=my-flink-image:latestThe gateway also adds multi‑tenant authentication and authorization, supporting row/column‑level permissions.
3. Multi‑tenant permission control and engine isolation
Authentication is performed via LDAP/OIDC; authorization is enforced at the engine, database and column level. Engine isolation can be configured at four granularity levels:
Connection‑level: each JDBC connection gets a dedicated engine instance.
User‑level: one engine per user, shared across connections of the same user.
User‑group level: engines are shared within a group.
Server‑level: a single shared engine for all users.
4. High availability and concurrency
Zookeeper is used for service registration, leader election and client‑side retry. This enables seamless rolling upgrades and automatic failover for the Kyuubi gateway and other components.
5. Vectorized execution with Gluten + Velox
Gluten provides a native columnar execution engine for Spark. The following Spark configuration activates Gluten:
spark.sql.extensions=org.apache.gluten.GlutenExtension
spark.sql.catalog.spark_catalog=org.apache.gluten.catalog.GlutenCatalog
spark.sql.execution.arrow.maxRecordsPerBatch=5000
spark.sql.shuffle.partitions=200Benchmarks on SSD‑based Parquet workloads show >2× speedup (10‑20% slower in container environments). The same stack is used for other engines when supported.
6. Remote Shuffle Service (RSS) with Celeborn
Celeborn implements an independent shuffle service that buffers mapper output in memory, pushes partitions to pre‑assigned workers and allows sequential reads, eliminating disk I/O and reducing the number of network connections. Spark is configured to use Celeborn as follows:
spark.shuffle.manager=org.apache.celeborn.client.ShuffleManager
spark.celeborn.master=celeborn-master:9090
spark.celeborn.shuffle.writer.buffer.size=64m
spark.celeborn.shuffle.reader.buffer.size=64mIn a 3‑node Kubernetes cluster the RSS integration improves Spark job runtime by 33.7% compared with the default shuffle implementation.
7. Storage‑compute separation and lake‑house support
The platform abstracts heterogeneous object stores (S3, HDFS, CEPH, OSS) behind a unified namespace and adds multi‑level caching (memory, SSD, HDD) to accelerate I/O. Apache Hudi is used as the lake‑house format, with the following enhancements:
Multi‑client concurrent writes (MOR tables) with 10‑30% write‑throughput improvement.
Dimension‑table joins for real‑time analytics using Flink.
8. Operations, installation and observability
A one‑click installer (shell script) provisions a full cloud‑native big‑data stack on clusters with fewer than 10 nodes in under one hour. Monitoring stack:
Prometheus scrapes metrics from Spark, Flink, Zookeeper and other components.
Grafana visualizes dashboards and alerts.
Loki + Promtail collect container logs; logs can be queried via Grafana UI.
Performance evaluation
Test environment:
Kubernetes master + 4 workers, each with 64 CPU, 512 GiB memory, 1 TiB SSD + 8 TiB HDD, 10 GbE network.
Spark driver: 8 GiB memory, 4 GiB overhead, 4 CPU.
50 Spark executors: 12 GiB memory, 4 GiB overhead, 4 CPU each.
Workloads:
TPC‑H and TDC‑DS queries on Parquet files stored on SSD.
Results:
Vectorized Spark (Gluten + Velox) achieves >2× speedup on SSD‑based Parquet workloads; container overhead reduces the gain by ~10‑20%.
Enabling Celeborn RSS yields a 33.7% reduction in job runtime.
Future roadmap
Support additional engines such as Doris and other OLAP systems.
Implement data‑affinity scheduling that pre‑loads remote data into local caches and schedules compute on cache‑resident nodes.
Extend vectorized execution to Presto, Trino and other SQL engines.
Productize RSS and further integrate it with the Gluten stack for end‑to‑end acceleration.
Continue to track cloud‑native innovations (e.g., serverless frameworks, service mesh) and incorporate them into the platform.
Illustrations
AsiaInfo Technology: New Tech Exploration
AsiaInfo's cutting‑edge ICT viewpoints and industry insights, featuring its latest technology and product case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
