Cloud Computing 26 min read

Predictive Auto-Scaling System (PASS) for Large-Scale Enterprise Web Applications

This article presents the PASS system, a predictive auto‑scaling solution developed by Meituan and Prof. Chai’s team, detailing the challenges of load forecasting and resource allocation, the ELPA ensemble model, performance modeling, hybrid scaling design, and extensive experiments that show superior QoS guarantees and lower resource costs compared to existing methods.

Meituan Technology Team

Feb 13, 2025

Predictive Auto-Scaling System (PASS) for Large-Scale Enterprise Web Applications

Background

Elastic scaling for large‑scale enterprise web services faces two core challenges: (1) accurate load prediction, because instance startup incurs a warm‑up period that makes purely reactive scaling degrade QoS; (2) efficient resource allocation that guarantees QoS (including tail latency) while minimizing cost.

Exploration Analysis

Prediction Experiment Exploration

Finding 1 : No single forecasting algorithm dominates across all time‑series categories. A diversified set of predictors is required for Meituan’s heterogeneous traffic.

Periodicity detection on 225 real service streams showed that 92.80% of applications exhibit strong periodicity (autocorrelation > 0.8), 4.55% show weak periodicity (0.5 – 0.8), and only 2.65% have no clear periodicity (< 0.5). Hence most services can be forecasted reliably.

Experiments on three representative services demonstrated dynamic diversity: for Service 1 the seasonal‑index method outperformed PatchTST, while for Service 2 PatchTST was superior.

Figure 1: Accuracy and robustness comparison of various forecasting algorithms

Finding 2 : Online prediction (short horizon, e.g., 15 min) yields higher average accuracy but fails on “burst” features; offline prediction (long horizon, e.g., 1 day) captures bursts but suffers amplitude bias, creating a “dirty interval” where predictions lag the true values.

Figure 1(c): Dirty interval illustration

Scaling Method Analysis

Finding 3 : Common cloud‑platform scaling methods (threshold‑based, target‑tracking) fail to guarantee QoS when tail‑latency (TP999) is required; QoS guarantee rates drop sharply.

Finding 4 : The performance models behind these methods are inaccurate. Threshold and target‑tracking assume QoS is met when QPS or CPU utilization stays within a heuristically set range. Queue‑theoretic models (e.g., M/M/s) rely on exponential inter‑arrival assumptions that do not hold in practice, leading to underestimated latency and QoS violations. Table 1 summarizes QoS guarantee rates and resource costs for three typical methods.

Table 1: QoS guarantee rates and resource costs of three common scaling methods

Technical Solution

PASS (Predictive Auto‑Scaling System) integrates an ensemble prediction framework (ELPA), a log‑based performance model, and a hybrid auto‑scaling strategy.

ELPA Prediction Model

ELPA selects the most suitable online model for the current time‑series and, when a burst is detected, switches to the best offline model with amplitude calibration. Model selection uses an exponentially weighted cumulative absolute error metric V_on (online) vs. V_off (offline) over recent cycles: V = β·E_current + (1‑β)·V_previous Burst detection is defined by the slope K(i‑1,i) = (x_i‑x_{i‑1}) / Δt; if K ≥ ε the segment is marked as a burst.

Amplitude calibration adjusts the offline prediction by a factor derived from the average ratio of real to offline values over a recent window [i‑a, i): scale = avg_{j∈[i‑a,i)} (real_j / offline_j) Adjusted offline prediction = offline_i × scale.

Performance Model Design

The log‑based performance model is built from historical monitoring logs (QPS, instance count, QoS metrics). Construction steps:

Aggregate logs by instance count.

Count QoS violations for each QPS bucket.

Sort entries first by QoS guarantee rate ≥ δ (default 0.99) then by descending QPS.

The top entry defines the maximum traffic the current instance pool can handle.

Model calibration enforces monotonic growth and fills missing QPS entries by linear interpolation. Example: raw mapping {5:30, 7:20, 8:60} becomes {5:30, 6:40, 7:50, 8:60} after interpolation.

Figure 4: Performance model construction flow

Hybrid Auto‑Scaling Design

Beyond predictive scaling, PASS adds a reactive fallback based on an M/M/s queueing model. When a QoS violation is detected, PASS recalculates the required QPS using the queueing formula (which tends to over‑estimate QPS relative to latency) and queries the performance model to provision the appropriate number of instances. Experiments show this fallback adds negligible extra resource waste.

Test Results

Experiment Environment

225 applications from Meituan were sampled and classified by difficulty: 164 simple (single‑waveform), 48 medium (spike or square patterns), 13 hard (mixed patterns). Representative backend services (user profile, behavior query, search, chat) were recorded, replayed offline, and sliced to focus on scaling‑relevant load.

Baselines:

Offline algorithms: Seasonal index, Prophet.

Online algorithms: LSTNet, PatchTST, TIDE (prediction horizon = 3 steps = 15 min, matching instance startup time).

Scaling baselines: Target‑tracking and Alibaba’s AHPA (queue‑theoretic). Threshold‑based scaling was omitted because target‑tracking consistently outperformed it.

Prediction Algorithm Evaluation

Conclusion 1 : ELPA consistently achieves the highest prediction accuracy across all difficulty levels.

Table 2: Accuracy summary of various prediction algorithms across datasets

ELPA outperforms single algorithms by dynamically selecting the optimal online/offline combination and applying amplitude calibration.

Conclusion 2 : Offline models capture burst features but exhibit significant amplitude deviation; online models lack burst handling. The ensemble with calibration yields robust performance.

Figure 4: Example predictions of online model, offline model (with amplitude adjustment), and ELPA

End‑to‑End Evaluation

Conclusion 3 : PASS’s performance model delivers the highest QoS guarantee rates and the lowest resource costs across all test scenarios. Compared with target‑tracking and AHPA, PASS improves average QoS guarantee by 5.54% and 7.71% respectively, and reduces average resource cost by 8.91% (target‑tracking) and 17.02% (AHPA). In scenario 4, resource cost drops by up to 40% and 52.76%.

QoS guarantee rate = (time QoS satisfied) / (total time). Resource cost = ∫ (instance count) dt (hours). Detailed monitoring data (QPS, instance count, TP99/TP999) are shown in Figures 5 and 6.

Instance startup currently lacks pre‑warming (e.g., DB connection initialization), so even proactive scaling can experience a short tail‑latency spike. Adding pre‑warm steps would further improve QoS guarantees.

Experience Summary

Enterprise scenarios vary in complexity; top‑conference algorithms may not fit all cases, requiring careful selection and simplification.

Model complexity does not guarantee performance; algorithm choice should match traffic features and scenario constraints.

Deployability matters—e.g., LSTNet supports multi‑series prediction, reducing deployment overhead for large‑scale services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud computing QoS Meituan ELPA load prediction performance model predictive auto-scaling

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.