How Meituan Optimized Delivery Ad Compute with Intelligent Power Allocation
This article describes Meituan's delivery advertising system's shift to intelligent compute allocation, detailing the business challenges, the four‑element framework for dynamic resource distribution, the optimal tier decision process, system stability mechanisms, experimental results, and future research directions.
1. Business Background
Meituan Waimai processes over 40 million orders daily, and its advertising service now spans more than ten business lines, leading to massive machine‑resource consumption. Traffic shows a clear dual‑peak pattern (lunch and dinner), causing high‑load pressure during peaks and significant compute waste during off‑peak periods, resulting in low overall compute‑allocation efficiency.
2. Overall Idea
The advertising engine uses a funnel‑style cascade architecture (recall, coarse‑ranking, fine‑ranking, mechanism). Intelligent compute aims to allocate compute differentially according to traffic value under system capacity constraints, improving efficiency and maximizing revenue. Four key elements are defined:
Traffic‑value quantification : measuring the revenue generated for the platform, advertisers, and users.
Traffic‑compute quantification : measuring the machine resources consumed, which depend on candidate set size, number of recall channels, model size, and link complexity.
System compute capacity quantification : total available machine resources, obtained via stress testing.
Intelligent compute allocation : defining “elastic actions” (elastic queue, elastic model, elastic channel, elastic link) and their corresponding “elastic tiers”.
Challenge Analysis
Problem solving : Optimize compute allocation under system constraints to maximize traffic revenue.
System stability : Ensure the intelligent compute framework remains stable and the whole pipeline runs smoothly.
Generality & extensibility : Support both advertising recommendation and search, and allow easy integration of new business scenarios.
3. Solution Design
The co‑designed framework consists of decision, collection, and control components. The decision component provides optimal tier selection and system‑stability guarantees; the other components support stability.
3.1 Optimal Tier Decision
Improvements over the DCAF baseline include:
Using more generic traffic‑compute metrics and adding a compute‑prediction module to improve accuracy.
Combining elastic queue and elastic model actions to handle traffic that cannot be modeled by queue length alone.
3.1.1 Problem Modeling
The original DCAF treats compute solely as queue length, which is insufficient for Meituan's diverse traffic. We extend the model to incorporate additional compute factors and introduce a compute‑prediction module.
3.1.2 Decision Framework
The optimal tier decision consists of offline and online stages, including traffic‑value prediction, traffic‑compute prediction, offline λ solving (binary search on replayed traffic), and online decision (calculating the best tier per request).
3.1.3 Traffic‑Value Prediction
We use an offline XGBoost model to estimate platform and merchant revenue, store bucketed values in a KV table, and retrieve them online via a lightweight lookup.
3.1.4 Traffic‑Compute Prediction
CPU time is used as the compute metric. Offline, we bucket features, train models per bucket, and store predictions in a KV table. Online, we extract features, look up the bucket, and obtain the compute estimate.
3.1.5 Tier Decision
Offline λ solving replays historical traffic and finds the optimal λ using binary search. Online, each request evaluates candidate tiers, computes expected revenue, and selects the tier that maximizes the objective.
3.2 System Stability Guarantee
We provide traffic admission rules, monitoring & alerting, circuit‑breaker degradation, and asynchronous decision making to keep latency low. A PID‑based real‑time control loop adjusts compute allocation based on CPU/GPU utilization, QPS, latency percentiles, and failure rate.
4. Experiments
4.1 Experimental Setup
System compute capacity is measured over the busiest 15‑minute interval.
Baseline is the system without intelligent compute.
Traffic‑value and traffic‑compute models are trained on recent data and stored in KV tables.
Offline λ solving is performed on replayed traffic to obtain tier configurations.
4.2 Experiment 1: Resource‑Neutral, Revenue‑Increase
With the same machine resources, CPM increased by 2.3 % by allocating more compute to high‑value traffic during peaks.
4.3 Experiment 2: Revenue‑Neutral, Resource‑Reduction
By suppressing compute during lunch and dinner peaks, the experiment group used only ~60 % of the baseline's compute while keeping overall revenue stable.
5. Conclusion & Outlook
The paper presents the design of intelligent compute for Meituan's delivery advertising, covering optimal tier decision and system‑stability mechanisms. Future work will explore evolutionary algorithms and reinforcement learning for end‑to‑end compute optimization and integrate with elastic scaling systems.
6. References
[1] Jiang, B., Zhang, P., Chen, R., Luo, X., Yang, Y., Wang, G., … & Gai, K. (2020). DCAF: A Dynamic Computation Allocation Framework for Online Serving System. arXiv preprint arXiv:2006.09684.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
