How Meituan Optimized Delivery Ad Compute with Intelligent Power Allocation

This article describes Meituan's delivery advertising system's shift to intelligent compute allocation, detailing the business challenges, the four‑element framework for dynamic resource distribution, the optimal tier decision process, system stability mechanisms, experimental results, and future research directions.

21CTO

Jul 1, 2021

How Meituan Optimized Delivery Ad Compute with Intelligent Power Allocation

1. Business Background

Meituan Waimai processes over 40 million orders daily, and its advertising service now spans more than ten business lines, leading to massive machine‑resource consumption. Traffic shows a clear dual‑peak pattern (lunch and dinner), causing high‑load pressure during peaks and significant compute waste during off‑peak periods, resulting in low overall compute‑allocation efficiency.

2. Overall Idea

The advertising engine uses a funnel‑style cascade architecture (recall, coarse‑ranking, fine‑ranking, mechanism). Intelligent compute aims to allocate compute differentially according to traffic value under system capacity constraints, improving efficiency and maximizing revenue. Four key elements are defined:

Traffic‑value quantification : measuring the revenue generated for the platform, advertisers, and users.

Traffic‑compute quantification : measuring the machine resources consumed, which depend on candidate set size, number of recall channels, model size, and link complexity.

System compute capacity quantification : total available machine resources, obtained via stress testing.

Intelligent compute allocation : defining “elastic actions” (elastic queue, elastic model, elastic channel, elastic link) and their corresponding “elastic tiers”.

Challenge Analysis

Problem solving : Optimize compute allocation under system constraints to maximize traffic revenue.

System stability : Ensure the intelligent compute framework remains stable and the whole pipeline runs smoothly.

Generality & extensibility : Support both advertising recommendation and search, and allow easy integration of new business scenarios.

3. Solution Design

The co‑designed framework consists of decision, collection, and control components. The decision component provides optimal tier selection and system‑stability guarantees; the other components support stability.

3.1 Optimal Tier Decision

Improvements over the DCAF baseline include:

Using more generic traffic‑compute metrics and adding a compute‑prediction module to improve accuracy.

Combining elastic queue and elastic model actions to handle traffic that cannot be modeled by queue length alone.

3.1.1 Problem Modeling

The original DCAF treats compute solely as queue length, which is insufficient for Meituan's diverse traffic. We extend the model to incorporate additional compute factors and introduce a compute‑prediction module.

3.1.2 Decision Framework

The optimal tier decision consists of offline and online stages, including traffic‑value prediction, traffic‑compute prediction, offline λ solving (binary search on replayed traffic), and online decision (calculating the best tier per request).

3.1.3 Traffic‑Value Prediction

We use an offline XGBoost model to estimate platform and merchant revenue, store bucketed values in a KV table, and retrieve them online via a lightweight lookup.

3.1.4 Traffic‑Compute Prediction

CPU time is used as the compute metric. Offline, we bucket features, train models per bucket, and store predictions in a KV table. Online, we extract features, look up the bucket, and obtain the compute estimate.

3.1.5 Tier Decision

Offline λ solving replays historical traffic and finds the optimal λ using binary search. Online, each request evaluates candidate tiers, computes expected revenue, and selects the tier that maximizes the objective.

3.2 System Stability Guarantee

We provide traffic admission rules, monitoring & alerting, circuit‑breaker degradation, and asynchronous decision making to keep latency low. A PID‑based real‑time control loop adjusts compute allocation based on CPU/GPU utilization, QPS, latency percentiles, and failure rate.

4. Experiments

4.1 Experimental Setup

System compute capacity is measured over the busiest 15‑minute interval.

Baseline is the system without intelligent compute.

Traffic‑value and traffic‑compute models are trained on recent data and stored in KV tables.

Offline λ solving is performed on replayed traffic to obtain tier configurations.

4.2 Experiment 1: Resource‑Neutral, Revenue‑Increase

With the same machine resources, CPM increased by 2.3 % by allocating more compute to high‑value traffic during peaks.

4.3 Experiment 2: Revenue‑Neutral, Resource‑Reduction

By suppressing compute during lunch and dinner peaks, the experiment group used only ~60 % of the baseline's compute while keeping overall revenue stable.

5. Conclusion & Outlook

The paper presents the design of intelligent compute for Meituan's delivery advertising, covering optimal tier decision and system‑stability mechanisms. Future work will explore evolutionary algorithms and reinforcement learning for end‑to‑end compute optimization and integrate with elastic scaling systems.

6. References

[1] Jiang, B., Zhang, P., Chen, R., Luo, X., Yang, Y., Wang, G., … & Gai, K. (2020). DCAF: A Dynamic Computation Allocation Framework for Online Serving System. arXiv preprint arXiv:2006.09684.