Cloud Computing 23 min read

Predictive Auto-Scaling System (PASS) for Large-Scale Enterprise Web Applications

The Predictive Auto‑Scaling System (PASS) jointly developed by Meituan and Prof. Chai Yunpeng’s team uses an ensemble learning‑based prediction algorithm that dynamically selects and calibrates online and offline models, combined with a log‑derived performance model and a hybrid scaling strategy, to accurately forecast QPS bursts, proactively adjust resources, and achieve up to 40 % cost savings while maintaining strict QoS guarantees for large‑scale enterprise web services.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Predictive Auto-Scaling System (PASS) for Large-Scale Enterprise Web Applications

Managing elastic scaling for large‑scale enterprise services faces two main challenges: accurate load prediction (to avoid QoS degradation caused by instance start‑up latency) and efficient resource allocation (to control cost while meeting service quality).

To address these challenges, Meituan collaborated with Professor Chai Yunpeng’s team from Renmin University of China. Their joint research resulted in the paper “PASS: Predictive Auto‑Scaling System for Large‑scale Enterprise Web Applications,” presented as a Research Full Paper at The Web Conference 2024 (CCF‑A).

Background – Precise QPS time‑series prediction is essential, but workloads exhibit strong periodicity, diverse patterns, and occasional “burst” features that make a single algorithm insufficient. Existing scaling methods (threshold‑based, control‑theoretic target tracking, queueing theory) often fail to guarantee QoS, especially tail‑latency requirements, due to inaccurate performance models.

Exploratory Analysis – An analysis of 225 real Meituan services showed that 92.80% have strong periodicity (ACF > 0.8). Experiments revealed that no single forecasting algorithm dominates across all services; the best algorithm depends on traffic characteristics. Online models achieve higher average accuracy but lag on bursty patterns, while offline models capture bursts but suffer amplitude bias.

Technical Solution – PASS – PASS combines an Ensemble Learning‑based Prediction Algorithm (ELPA) with a log‑based performance model and a hybrid auto‑scaling strategy. ELPA dynamically selects the most suitable online or offline model for each service and applies amplitude calibration to offline predictions. The performance model is built from historical logs (QPS, instance count, QoS metrics) without requiring per‑application profiling. A reactive fallback based on M/M/s queueing theory corrects QoS violations in real time.

ELPA Details – ELPA maintains a pool of online and offline models. It evaluates recent prediction errors using an exponentially weighted cumulative absolute error (V) to decide which model to use. When a “burst” feature is detected (slope ≥ ε), the offline model is activated and its predictions are scaled by a calibrated factor derived from recent error statistics.

Performance Model Design – The log‑based model aggregates logs by instance count, computes QoS violation frequencies, and selects the maximum QPS that satisfies a QoS guarantee threshold (δ). Model calibration ensures monotonicity and fills gaps caused by sparse data.

Hybrid Auto‑Scaling – In addition to predictive scaling, PASS monitors QoS metrics; upon violation, it uses the calibrated queueing model to estimate a higher safe QPS and triggers immediate scaling.

Evaluation – Experiments on 225 services (simple, medium, hard) compared PASS with state‑of‑the‑art predictors (PatchTST, Seasonal Index, Prophet, LSTNet, TIDE) and scaling methods (target tracking, AHPA). PASS achieved the highest prediction accuracy across difficulty levels, the best QoS guarantee rate, and the lowest resource cost (up to 40% reduction in some scenarios). Figures and tables in the original paper illustrate these gains.

Conclusion – PASS demonstrates that a hybrid predictive‑auto‑scaling framework, which adapts model selection and calibrates offline predictions, can substantially improve QoS assurance and resource efficiency for large‑scale cloud services.

Partner Introduction – Professor Chai Yunpeng’s team focuses on cloud computing, databases, and systems research, publishing in top venues such as ASPLOS, SOSP, SIGMOD, and WWW, and actively collaborates with industry to translate research into practice.

cloud computingAuto Scalingelastic scalingPredictive Modelingperformance modeling
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.