Flexible Compute Scheduling Practices in the Restaurant Industry: A Yum China Case Study
This article examines the challenges of uneven compute resource distribution across China and presents Yum China's practical approaches—including multi‑unit deployment, dual‑data‑center scheduling, and supporting platforms—to achieve flexible, cost‑effective compute scheduling for the restaurant sector.
With the rapid growth of AI, big data, and cloud computing, the demand for compute power has surged while resources remain unevenly distributed; the eastern region faces high electricity costs and limited land, whereas the western region offers abundant clean energy but lacks data and application scenarios, prompting the national "East Data West Compute" strategy.
Yum China, representing the restaurant industry, explores flexible compute scheduling to reduce costs and meet varied industry demands, beginning with an analysis of CPU usage patterns that show peak usage below 40% and idle periods covering two‑thirds of the day.
01 – Current Situation
The CPU monitoring curve reveals that most of the day the CPU load is low, indicating ample capacity for offline computing and task scheduling.
02 – Prerequisites
All business systems, offline computing, and task scheduling services are containerized and support stateless deployment.
Services can be launched within 5 minutes in any IDC unit.
Comprehensive monitoring ensures automatic traffic switchover during failures to keep online services stable.
Failure monitoring and backup plans are required for scheduling failures.
03 – Solution
1. Multi‑unit deployment model for compute scheduling: By isolating systems across multiple units, traffic can be shifted by unit, and compute resources can be scaled up or down accordingly.
2. Dual‑data‑center scheduling model: Services support lossless scaling; for example, reduce pod count from 4n to n at 22:00 and scale back to 4n before the 10:00 peak.
3. Fault or surge handling: During low‑traffic periods, a failed unit can be bypassed via A/B traffic switching; during high‑traffic or unexpected spikes, public cloud can quickly provision a new unit within 5 minutes for load shedding.
04 – Supporting Platforms and Tools
Traffic control platform for automatic or manual traffic scheduling and throttling.
Compute scheduling platform to automate scaling of services across units or data centers.
Monitoring and alerting system to provide real‑time health data for automated traffic decisions.
DTS (Data Transfer Service) for cross‑region database and cache synchronization.
In conclusion, future compute scheduling platforms will face greater opportunities and challenges, requiring advancements in intelligence, automation, security, and standardization to achieve collaborative development and shared benefits.
Yum! Tech Team
How we support the digital platform of China's largest restaurant group—technology behind hundreds of millions of consumers and over 12,000 stores.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.