How Cloud‑Native Scheduling Redesign Boosts Scalability and Efficiency
This article explains the concept of cloud‑native computing, its reliance on IaaS/PaaS/SaaS, the challenges of stateful services, and proposes a redesigned scheduling system—including storage‑aware, topology‑aware, label‑driven, and priority‑based mechanisms—to improve resource utilization, scalability, and multi‑tenant performance.
Cloud native, introduced by Matt Stine in 2013, aims to free development and operations by designing applications that fit cloud architecture, emphasizing availability, scalability, and CI/CD efficiency.
Cloud native relies on the three layers of traditional cloud computing: IaaS, PaaS, SaaS. IaaS provides programmable immutable infrastructure via APIs; PaaS offers composable business capabilities; SaaS runs directly on cloud resources.
Cloud native applications are built specifically for deployment on cloud platforms, leveraging on‑demand resource allocation, elastic scaling, resilience, multi‑tenant support, and the ability to scale to thousands of nodes.
Enterprise digital transformation often moves from monolithic to distributed architectures, but many enterprises still use monoliths due to lower traffic.
Native cloud platforms are micro‑service‑oriented and use containers for orchestration. Stateless services are easily scheduled, but stateful services (e.g., MySQL, HDFS) lose data when containers are destroyed, requiring storage‑aware scheduling.
Big data and AI workloads need data‑local computation; container networks (overlay) hide physical topology, preventing locality optimizations.
Resource‑intensive workloads dynamically request CPU, GPU, memory, etc., challenging native schedulers.
Cloud Native Architecture Design Considerations
To address these challenges, the scheduling system is redesigned with the following capabilities:
Support for local storage and storage‑aware container scheduling
Network physical‑topology awareness and real‑time scheduling
Dynamic labeling and label‑based scheduling
Application dependency and runtime‑parameter awareness for parameter‑driven scheduling
The overall platform architecture sits on top of Kubernetes and includes a configuration center, a physical resource pool, cloud storage, cloud network, and a label center.
Above these services lies the cloud scheduling system, which receives application requests, gathers metrics from the configuration, label, storage, and network services, and makes precise scheduling decisions for big‑data, AI, database, and micro‑service workloads.
The scheduling system consists of an external‑service module (parses deployment yaml, builds dependency graph), a parameter‑calculation module (computes final resource needs), an instance‑rendering module (produces full application description), and a decision‑making module.
The metadata module interacts with Kubernetes to maintain up‑to‑date information on resources, network topology, storage topology, service metrics, and configuration data, enabling real‑time optimal scheduling.
When an application is submitted, six internal schedulers evaluate it: dependency, storage, resource, network, label, and SLA schedulers. They filter nodes, score them, and select the best node; if no node has sufficient resources, the SLA scheduler performs priority‑based pre‑emptive scheduling.
Configuration‑label center provides centralized management of config and label metadata via Kubernetes ConfigMap, allowing dynamic updates visible to containers.
Cloud storage service (Warpdrive) offers RESTful APIs for real‑time storage volume usage, enabling the scheduler to monitor storage events.
Cloud network service supplies RESTful APIs for container‑to‑host IP mappings and firewall rules, giving the scheduler visibility into network state.
The scheduling decision flow involves receiving resource, label, dependency, and I/O requirements, consulting metadata, and outputting the chosen physical node(s) and container priorities.
The scheduler operates in two phases: a filtering phase that selects nodes meeting all criteria (resource, port, storage, topology, labels), and a scoring phase that prefers nodes with more free resources, cached images, affinity/anti‑affinity matches, and balanced task distribution.
Priority‑based pre‑emptive scheduling is applied when higher‑priority tasks need resources, either by pre‑empting low‑priority tasks during the filtering phase or by evicting them at runtime based on actual usage.
Data‑topology awareness is achieved by locating the storage nodes for required data, selecting the least‑loaded physical node, and optionally using domain sockets for fast local data transfer.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
