How Alibaba’s Co‑Location (Mixed‑Deployment) Cuts Costs and Boosts Utilization
Alibaba’s mixed‑deployment (Co‑location) technology combines online services and batch compute tasks on shared physical resources, using priority‑based scheduling, resource isolation, and dynamic memory management to dramatically improve CPU utilization, cut infrastructure costs, and maintain service level objectives during peak traffic events.
Mixed‑Deployment (Co‑location) Overview
Mixed‑deployment, or Co‑location, mixes different types of workloads on the same physical resources, scheduling online services and batch compute tasks together while ensuring service‑level objectives (SLOs) and significantly reducing costs.
Background
During major traffic peaks such as Alibaba’s Double‑11 shopping festival, massive compute resources are required, yet they remain idle for most of the day. Global server CPU utilization is only 6‑12%, and even with virtualization it reaches merely 7‑17%. Alibaba’s online services average about 10% utilization.
Conversely, big‑data processing frameworks (Hadoop, Spark, Flink, TensorFlow, etc.) generate high‑CPU workloads that peak at night, often exceeding 50‑60% CPU usage. These workloads are typically isolated in separate clusters.
Motivation for Mixing Clusters
Just as tidal traffic patterns cause directional congestion, online services experience low load at night and high load during the day, while batch jobs show the opposite pattern. By allowing low‑priority batch tasks to run on idle online‑service resources, overall utilization improves dramatically.
Key Characteristics of Mixed‑Deployment
Priority Separation: Low‑priority batch tasks can be pre‑empted without affecting high‑priority online services.
Resource Complementarity: Online services are low‑utilization during the day and high during peaks; batch jobs are high during off‑peak hours, enabling complementary scheduling.
Cost Savings Example
Assuming a data center with N servers, increasing average utilization from R1 to R2 saves X = N * (R2 - R1) / R2 servers.
N*R1 = (N-X)*R2
=> X*R2 = N*R2 – N*R1
=> X = N*(R2-R1)/R2For 100,000 servers, raising utilization from 28% to 40% saves roughly 30,000 machines, equating to about 600 million RMB in cost.
Historical Timeline
2014: Technical feasibility studies and design.
2015: Testing environment setup; identified scheduling, isolation, storage, and memory challenges.
2016: Small‑scale production validation with ~200 nodes.
2017: Full‑scale production; ~20% of Double‑11 traffic ran on mixed‑deployment clusters.
Architecture of Mixed‑Deployment Scheduling
Two independent schedulers run side‑by‑side:
Sigma: Manages online service containers, compatible with Kubernetes APIs and Alibaba’s OCI‑compatible Pouch containers.
Fuxi: Handles massive data‑processing jobs, supporting MapReduce‑style pipelines, high parallelism, and fault tolerance.
A zero‑layer coordination layer mediates resource allocation between Sigma and Fuxi.
Resource Isolation Mechanisms
CPU Scheduling Optimization : CGroup priority settings allow high‑priority tasks to pre‑empt low‑priority ones; hyper‑threading noise is avoided.
L3 Cache Isolation : Uses Intel CAT to limit cache usage of low‑priority tasks.
Memory Bandwidth Isolation : Monitors bandwidth and adjusts CFS bandwidth control to favor high‑priority workloads.
Memory Protection : Separate CGroup memory reclamation, OOM killing prioritizes low‑priority batch tasks.
IO Isolation : File‑level bandwidth caps, metadata throttling, and tiered bandwidth sharing (gold, silver, bronze).
Network Flow Control : Host‑level bandwidth isolation (TC) and container‑level bandwidth sharing.
Future Directions
Mixed‑deployment will evolve toward finer‑grained scheduling, support for GPUs and FPGAs, scaling to million‑core clusters, and deeper integration of machine‑learning‑driven resource profiling. The goal is to make mixed‑deployment a universal scheduling capability across all resource types.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
