How Alibaba’s Mixed‑Deployment Cuts Costs and Boosts Resource Utilization
This article explains Alibaba's mixed‑deployment (co‑location) technique, detailing its motivation, architecture, resource‑sharing mechanisms, scheduling strategies, performance results, and future directions for scaling and refining resource utilization across online and offline workloads.
1. Alibaba Mixed‑Deployment Overview
Mixed‑deployment (co‑location) aims to balance growing business demand with rising resource costs by reusing existing resources to support multiple workloads.
1.1 Why Mixed‑Deployment?
Rapid growth of e‑commerce traffic, especially during large promotions like Double‑11, creates massive, pulse‑like resource pressure, while data‑center utilization remains low (around 10% for online services).
1.2 What Is Co‑Location?
Co‑location integrates different business types onto shared physical resources, providing each with its own logical quota while allowing controlled competition for resources.
1.3 Online‑Offline Mixed Deployment
Online workloads (transactions, payments) require real‑time, non‑degradable performance, whereas offline workloads (batch computation, reporting) tolerate latency and can be retried, making them suitable for resource sharing.
1.4 Exploration Timeline
2014: Concept introduced.
2015: Offline testing and prototypes.
2016: First production rollout on ~200 machines.
2017: Scale‑up to thousands of machines, supporting Double‑11.
2018: Goal to reach ten‑thousand‑node clusters.
1.5 Achievements
Mixed‑deployment clusters of several thousand nodes increased CPU utilization from 10% to 40%, supported tens of thousands of transactions per second, and kept service impact under 5%.
2. Mixed‑Deployment Architecture
The architecture consists of four layers: infrastructure, resource pool, scheduling, and business‑level control. Existing online (Sigma) and offline (Fuxi) schedulers are coordinated by a unified "0‑layer" scheduler that arbitrates resource allocation.
2.1 Overall Architecture
Bottom layer: unified data‑center hardware. Resource layer: shared resource pool. Scheduling layer: Sigma (online) and Fuxi (offline) with a coordinating 0‑layer. Business control layer: policies for deployment, monitoring, and decision making.
2.2 Online Deployment Strategy
Online services are packaged into isolated transaction units, allowing safe, incremental mixed‑deployment trials while preserving service isolation.
2.3 Resource Allocation
CPU is time‑sliced between online and offline tasks, giving online higher priority. Memory is over‑committed by allocating idle online memory to offline jobs with safeguards. Disk I/O is throttled for offline workloads to protect online latency.
2.4 Promotion‑Time Resource Yielding (Fast‑Up/Fast‑Down)
During large promotions, online capacity is quickly scaled up, and after the event it is scaled down, releasing resources to offline workloads.
2.5 Daily Resource Yielding (Time‑Sharing)
Online traffic follows a diurnal pattern; resources are dynamically re‑allocated to offline jobs during low‑traffic periods.
3. Core Technologies
3.1 Kernel Isolation
Isolation is enforced via cgroups for CPU, memory, I/O, and network, with priority rules that favor online services.
3.2 Resource Scheduling
Online scheduler Sigma performs application profiling, bin‑packing, affinity/anti‑affinity, and auto‑scaling. Offline scheduler Fuxi handles batch job placement, memory over‑commit, and graceful degradation.
3.3 Unified 0‑Layer Scheduler
The 0‑layer arbitrates between Sigma and Fuxi, ensuring fair sharing while respecting online priority.
4. Future Outlook
Mixed‑deployment will evolve toward larger scale (ten‑thousand‑node clusters), broader workload and hardware support, and finer‑grained resource profiling and real‑time scheduling.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.