Cloud Computing 10 min read

Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments

This article presents a cloud-native, end‑to‑end autoscaling solution that integrates traffic forecasting, CPU utilization meta‑prediction, and a reinforcement‑learning‑based scaling decision module into a fully differentiable system, achieving higher resource utilization and cost efficiency as demonstrated by ACM SIGKDD 2022 research.

AntTech
AntTech
AntTech
Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments

Data center resource rationalization is a challenging problem; Ant Group’s applications have low average utilization (<10%). To improve efficiency, the team built an intelligent, fully managed capacity system that performs timed and traffic‑aware predictive autoscaling.

The solution, described in the ACM SIGKDD 2022 paper "A Meta Reinforcement Learning Framework for Predictive Autoscaling in the Cloud," combines traffic prediction and scaling decisions into a single, fully differentiable reinforcement‑learning pipeline, outperforming existing SOTA methods.

Workload Forecaster : A lightweight attentional encoder‑decoder model predicts future traffic from historical data, explicitly decomposing periodicity and leveraging attention for multi‑step forecasts.

CPU Utilization Meta‑Predictor : Using a meta‑learning approach (Attentive Neural Process), a single model maps traffic features to CPU usage across thousands of services, producing task embeddings that inform downstream decisions.

Scaling Decider : A meta model‑based reinforcement‑learning agent treats autoscaling as a Markov Decision Process, where the state includes traffic forecasts, CPU embeddings, and utilization; the reward balances target CPU usage and scaling frequency; actions are scaling ratios. Model‑based RL accelerates convergence by incorporating learned dynamics.

The MDP formulation defines state, reward, action, and transition functions, with the transition model learned from historical data to simulate environment dynamics.

Offline experiments and online deployments show the system maintains CPU utilization around 25% during peak hours and significantly improves resource efficiency, moving towards a serverless‑like experience for online services.

References include works on attentive neural processes, deep reinforcement learning, soft actor‑critic, and model‑based RL.

cloud computingautoscalingreinforcement learningmeta-learningcapacity-managementpredictive modeling
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.