Operations 11 min read

How Scheduling Algorithms Power Efficient Data Center Resource Management

This article explains how modern data centers rely on cluster resource management systems and sophisticated scheduling algorithms to allocate containers across machines, improve application availability, reduce costs, and meet diverse constraints, while also introducing Alibaba’s global scheduling algorithm competition and its challenge details.

Alibaba Cloud Native

Jun 21, 2018

How Scheduling Algorithms Power Efficient Data Center Resource Management

Role of Scheduling Algorithms in Cluster Resource Management

Cluster Resource Management Systems (CRMS) treat an entire data center as a single compute resource. The scheduler decides on which physical machine each compute task—or, in containerized environments, each container instance—should run. Effective scheduling improves resource utilization, maintains application stability, and reduces operational costs.

Scheduling Objectives at Different Hierarchical Levels

Container Level

Guarantee that each container receives the required CPU, memory, disk, and network bandwidth.

Support special requirements such as a particular OS version or hardware feature.

Avoid resource contention by keeping “resource‑heavy” containers apart (e.g., two memory‑intensive containers on the same host).

Application Level

Deploy multiple instances of an application across different hosts, racks, rooms, or data centers to achieve high availability.

Distribute instances geographically to mitigate correlated failures (disaster‑recovery).

Allow custom policies such as ordered instance launch, data‑locality preferences, or other constraints.

Data‑Center Level

Pack more workloads onto fewer servers, thereby saving hardware, power, cooling, and floor space.

Handle fairness, inter‑application interference, and fine‑grained resource controls (e.g., hyper‑threading, memory‑bandwidth limits).

Alibaba Global Scheduling Algorithm Challenge – Problem Overview

The competition models a realistic production environment with three major constraint categories.

1. Resource Constraints

Each instance specifies time‑varying CPU and memory requirements over a 24‑hour period, represented as a curve with 98 sampling points. The curves are derived from historical usage of long‑running services (e.g., e‑commerce platforms) and repeat daily.

2. High‑Availability Constraints (P, M, PM)

Critical applications are labeled P , M , or PM . The scheduler limits the number of instances of each label that may coexist on a single machine, reducing the impact of a host failure on important services.

3. Anti‑Affinity Constraints

Expressed as <App1, App2, k>: if a host already runs an instance of App1, it may host at most k instances of App2. This models observed interference between specific application pairs.

Optimization Objective

The goal is to keep each machine’s resource utilization within a predefined safe range (leaving a margin for load spikes) while minimizing the total number of machines that host containers. In the challenge, migrations are cost‑free, simplifying the objective compared to production where migration incurs overhead.

Scale of the Testbed

Approximately 6,000 host machines.

About 68,000 container instances (a mix of already‑deployed and pending instances).

All three constraint types are present.

Additional Practical Considerations

Fairness among applications.

Inter‑application interference and shared resource throttling.

Fine‑grained allocation such as hyper‑threading and memory‑bandwidth caps.

These factors are reflected in Alibaba’s production scheduler Sigma, which implements a highly complex rule set.

Illustrative Diagram

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Resource Management Scheduling competition Data Center

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.