Multi-Action Computation Allocation via Evolutionary Strategies in Meituan Takeaway Advertising
This article analyzes Meituan's delivery advertising system, detailing the shift from linear programming to an evolutionary‑strategy‑based multi‑action allocation (ES‑MACA), describing problem formalization, offline training, reward evaluation, online decision flow, extensive offline and online experiments, and future directions toward reinforcement learning.
In the fast‑growing Meituan delivery business, advertising traffic creates a new bottleneck: computation resources become scarce during peak hours while remaining under‑utilized off‑peak. The goal of "intelligent computation" is to allocate these resources finely and individually to maximize revenue under overall compute constraints.
Overall Approach
The first phase used a linear‑programming solution (DCAF) to allocate elastic queues. This second phase introduces ES‑MACA (Evolutionary Strategies based Multi‑Action Computation Allocation), which jointly decides three actions—elastic channel, elastic queue length, and elastic model—across the full ad‑serving funnel.
Problem Formalization
Given M decision modules, the objective is to select a compute tier for each module so that total compute stays within a budget while maximizing overall traffic revenue. The formulation is expressed as a constrained optimization problem (see the diagram in the source) and modeled as a Markov Decision Process (MDP) or Partially Observable MDP.
Challenges
Generality: Existing linear‑programming methods are tightly coupled to specific business constraints; any change requires costly re‑modeling and strong data assumptions.
Sequential Decision: Actions at earlier stages affect the state for later stages, making it impossible to evaluate a single module in isolation.
Evolutionary‑Strategy Solution
Evolutionary algorithms (EA) avoid local optima, parallelize naturally, handle non‑convex, discontinuous problems, and require little prior knowledge. ES‑MACA uses the Cross‑Entropy Method (CEM) to evolve agent parameters that maximize a reward composed of business revenue plus a penalty for exceeding compute limits.
Offline Training
Randomly initialize a population of agent parameters (mean and variance).
For each sample, run an offline simulator that replays historical traffic, letting the agent interact with the system to produce a sequence of actions.
Estimate revenue using a multi‑task DNN that predicts exposure, click, and order values.
Rank samples by expected revenue, keep the top‑K, update the mean/variance, and repeat until convergence.
The simulator reproduces the online interaction logic (recall → queue truncation → model selection) and provides a reward for each parameter set.
Reward Evaluation
The reward combines the estimated business revenue with a compute‑budget penalty; stricter constraints increase the penalty coefficient.
Online Decision
At runtime, the offline‑optimized agent receives the current traffic state, decides the elastic channel, then the queue length, and finally the model index, each step feeding back the updated state to the next decision.
System Construction
The platform adds lightweight multi‑action decision capability, fine‑grained PID feedback control for compute tiers, and standardizes actions, data, and workflow. Actions such as "experiment", "feature fetch", "parameter handling", and "ES‑MACA decision" are abstracted as reusable components, enabling rapid integration across dozens of business lines.
Experiments
Offline Experiments
Baselines (fixed decisions) were compared against single‑action elastic channel, queue, model, modular optimal learning, and the full ES‑MACA joint optimization. Results showed that joint optimization outperforms modular learning by 0.53% in revenue, confirming interaction effects among actions.
Online A/B Test
A week‑long online test compared three groups: baseline (no intelligent allocation), elastic‑queue only, and ES‑MACA (full joint allocation). ES‑MACA achieved higher CPM and revenue (+~1.x%) while meeting latency targets (TP99 + 1.8 ms, TP999 + 2.6 ms).
Conclusion and Outlook
The paper documents the evolution from linear programming to an EA‑based multi‑action allocation, demonstrating measurable gains in Meituan's delivery ad system. Future work includes exploring reinforcement‑learning agents for finer‑grained joint optimization and extending the framework to online, near‑line, and offline layers.
References
Jiang et al., "DCAF: A Dynamic Computation Allocation Framework for Online Serving System" (2020).
Yang et al., "Computation Resource Allocation Solution in Recommender Systems" (2021).
Meituan advertising series (various internal reports).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
