MagicScaler: Achieving High QoS and Low Cost with Uncertainty‑Aware Autoscaling
The MagicScaler framework, introduced by Alibaba Cloud’s big‑data engineering team and collaborators, combines a multi‑scale attention Gaussian process predictor with an uncertainty‑aware elastic scaling decision engine, delivering significantly higher quality‑of‑service and lower operational costs than traditional autoscaling methods, as demonstrated on real MaxCompute workloads.
Opening
Recently, Alibaba Cloud's big data engineering team together with the MaxCompute team, East China Normal University, and DAMO Academy published the paper “MagicScaler: Uncertainty‑aware, Predictive Autoscaling”, which was accepted at VLDB 2023.
Background
As cloud demand grows, allocating resources based on user demand is crucial for stability and cost control. Three typical scaling strategies are shown in Figure 1: Conservative (high cost, low QoS risk), Passive (low cost, high QoS risk), and Predictive Autoscaling, which aims to anticipate demand.
Challenges
Real workloads exhibit high complexity, uncertainty, and granularity‑sensitive temporal dependencies (Figure 2), making accurate demand prediction difficult and posing challenges for proactive scaling.
Solution: MagicScaler
MagicScaler consists of a predictor and a scheduler.
Predictor
The predictor builds a multi‑scale attention Gaussian regression model. It extracts multi‑scale features (MAFE) from historical demand sequences, feeds them into a Gaussian Process Regression (GPR) model, and produces demand forecasts with quantified uncertainty (Figure 4).
Scheduler
The scheduler treats elastic scaling as a Markov Decision Process (MDP). Using rolling‑horizon optimization, it approximates the solution of an infinite‑horizon Bellman equation, balancing resource cost against QoS violation risk (Figure 5).
Evaluation
Experiments on three MaxCompute clusters with real workloads show that MagicScaler outperforms classic autoscaling algorithms in both cost and QoS.
Future Work
Further research will explore integrating MagicScaler with existing MaxCompute scheduling strategies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
