Operations 12 min read

How Qunar Automates Hotel Capacity Planning with Predictive Scaling

This article details Qunar's end‑to‑end solution for forecasting traffic spikes, estimating required CPU resources, and automatically scaling hotel services using a combined flow‑calendar, algorithmic prediction, and Ops‑driven auto‑scaling pipeline, improving stability and operational efficiency.

dbaplus Community
dbaplus Community
dbaplus Community
How Qunar Automates Hotel Capacity Planning with Predictive Scaling

Background

Peak traffic events such as exam‑ticket printing periods, national exams, and major holidays cause sudden surges in hotel booking volume, often exceeding system capacity and leading to performance degradation, throttling, or crashes. Manual capacity calculations for these events are inaccurate and inefficient, prompting the need for an automated, pre‑emptive scaling approach.

Overall Solution

1. System Architecture

The solution integrates three core components:

Flow‑Calendar Platform – aggregates business monitoring data and provides event calendars.

Algorithm Platform – trains predictive models on historical CPU and order data to forecast future CPU demand.

Ops Interface – receives predicted CPU totals, converts them to instance counts, and schedules automatic scaling tasks.

2. Business Process

The event lifecycle consists of nine stages: Pre‑assessment → Pending Evaluation → Evaluating → Evaluation Complete → Task Creation → Scaling → Scale Complete → Review → Closed.

Key steps include:

Collect CPU core data per application and environment from Ops.

Estimate peak order/QPS values for the event and request a prediction from the algorithm platform.

Algorithm returns total CPU cores needed; Ops translates this into the required number of instances and creates timed scaling tasks.

3. Event Types

Three event categories are supported:

EXAM : Exam‑ticket printing spikes (e.g., graduate or civil service exams).

HOLIDAY : Regular holiday travel peaks without abnormal spikes.

ACTIVITY : Promotional activities such as flash sales that cause sudden traffic spikes.

4. Prediction Methodology

The peak business volume is estimated using the formula: Estimated Peak = Baseline × (1 + Business Growth Rate) Business growth rate is calculated as:

Growth Rate = Historical Growth × (1 + Natural Growth + External Impact)

Key metrics include historical growth, natural growth, external impact, and baseline values, all derived from manual inputs or automated calculations.

5. Evaluation and Metrics

After scaling, the system evaluates prediction accuracy using:

Coverage Rate = (Predicted ∩ Actual) / Actual

Accuracy Rate = (Predicted ∩ Actual) / Predicted

Additional indicators such as MAPE (0.08), order‑CPU correlation (0.91), and average CPU usage (32.5) are tracked.

6. Scaling Policies

Safety limits are enforced:

Maximum replica limit – prevents creating more instances than downstream resources allow.

Minimum replica limit – ensures a lower bound based on a configurable percentage.

7. Review and Continuous Improvement

Post‑event reviews compare predicted and actual growth to refine models. The model is retrained regularly with recent peak events, and offline validation ensures new models meet or exceed online performance.

8. Project Impact and Value

Key outcomes:

150+ applications onboarded, covering >90% of hotel service CPU capacity.

Average coverage of predicted applications: 96%.

Average prediction accuracy: 89%.

Each peak event saves ~3 person‑days of manual ops, totaling ~270 person‑days annually.

Resource prediction reduces manual provisioning by ~20%.

9. Future Plans

Planned enhancements include expanding intelligent scaling to all applications and infrastructure layers (including physical servers, KVM, DB/Redis), improving algorithm accuracy with AI, and extending the solution across all business lines for enterprise‑wide resource orchestration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesResource Managementcapacity planningAuto ScalingAlgorithmic Forecastingpredictive scaling
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.