Inside Alibaba’s Double‑11 Scaling: Unified Scheduler, Mixed‑Deployment & Cloud‑Native Architecture
Alibaba’s nine‑year Double‑11 evolution grew transaction volume 280‑fold, prompting a unified scheduling system (Sigma), mixed‑deployment of online and offline tasks via Sigma and Fuxi, and a cloud‑native architecture that leverages PouchContainer, resource isolation, and dynamic scaling to cut IT costs and boost resource utilization.
Unified Scheduling System (Sigma)
Sigma is Alibaba’s cluster scheduler launched in 2011 for online services. It consists of three cooperating layers:
Alikenel – a kernel‑level agent deployed on every physical machine that augments the Linux kernel to allocate CPU time slices and resources according to configurable priorities and policies.
SigmaSlave – runs on the host to perform container‑level CPU allocation, handle emergency scenarios, and make rapid decisions for latency‑sensitive tasks.
SigmaMaster – the global controller that stores scheduling requests, matches resource requirements, and runs optimization algorithms to coordinate thousands of machines.
The system follows an end‑state design: requests are persisted, the scheduler matches requirements, and slaves enact local deployments, achieving strong coordination and eventual consistency. Sigma was rewritten in Go in 2016 and added Kubernetes API compatibility in 2017.
Mixed‑Deployment Architecture
Since 2014 Alibaba runs both online (long‑lived, latency‑sensitive) and offline (short‑lived, high‑throughput) workloads on the same physical machines. Online services are launched as PouchContainer containers by Sigma; offline compute jobs are scheduled by the Fuxi scheduler on the same host. By sharing idle CPU cycles with offline tasks, average CPU utilization rises from ~10 % to >40 % while latency impact stays below 5 %.
Key Technologies for Mixed Deployment
Kernel‑Level Resource Isolation
Noise Clean isolates hyper‑threading resources to prevent offline tasks from stealing online HT cores.
Task Preempt added to the CFS scheduler raises priority for online tasks.
Cache Allocation Technology (CAT) partitions the last‑level cache (LLC) on Broadwell‑plus CPUs for offline workloads.
CGroup isolation with OOM priority and Bandwidth Control limits offline bandwidth consumption.
Memory elasticity allows offline tasks to exceed their memcg limits when online memory is idle and forces prompt release when online tasks need memory.
Network QoS tags (gold, silver, bronze) enforce hierarchical bandwidth guarantees.
Online Cluster Management
Profiling of application memory, CPU, network, and I/O to build resource‑usage models and perform time‑based correlation analysis for optimal placement.
Affinity/anti‑affinity rules combined with task priority decide co‑location of applications to maximize throughput.
Two scheduling strategies: “stability‑first” for peak events (e.g., Double‑11) flattens allocation to keep all resources above a minimum level; “utilization‑first” for normal operation pushes used resources to the highest level, freeing capacity for large‑scale compute.
Support for automatic scaling, vertical scaling, and time‑slice multiplexing.
Rapid site‑wide scaling and elastic memory techniques to handle sudden load spikes.
PouchContainer and Containerization Progress
PouchContainer is Alibaba’s internal container engine, originally built on LXC in 2011 and extended in 2015 to be compatible with Docker images. It incorporates Alibaba‑specific kernel patches for strong isolation, supports millions of containers, and provides a peer‑to‑peer image distribution mechanism.
The runtime exposes APIs compatible with RunC, RunV, and RunLXC, integrates with the CSI storage interface (e.g., Ceph, Pangu), and uses lxcfs for network isolation. By 2017 100 % of online services and compute tasks were containerized.
PouchContainer was open‑sourced on 10 Oct 2017. Repository: https://github.com/alibaba/pouch
Cloud‑Native Architecture for Double‑11
The cloud‑native stack separates clusters into online service clusters, compute clusters, and ECS clusters. Sigma can request resources from compute servers to launch Pouch containers, while Fuxi can request resources from Sigma to create its own containers. During the Double‑11 shopping festival a dedicated cloud region isolates traffic, and cross‑datacenter scheduling treats multiple data centers as a single logical computer.
Mixed deployment, time‑slice multiplexing, and elastic scaling reduced additional IT cost for Double‑11 by 50 % and lowered daily IT cost by 30 %, demonstrating the economic impact of unified scheduling and container technologies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
