Multi-Action Computation Allocation via Evolutionary Strategies in Meituan Takeaway Advertising

This article analyzes Meituan's delivery advertising system, detailing the shift from linear programming to an evolutionary‑strategy‑based multi‑action allocation (ES‑MACA), describing problem formalization, offline training, reward evaluation, online decision flow, extensive offline and online experiments, and future directions toward reinforcement learning.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Multi-Action Computation Allocation via Evolutionary Strategies in Meituan Takeaway Advertising

In the fast‑growing Meituan delivery business, advertising traffic creates a new bottleneck: computation resources become scarce during peak hours while remaining under‑utilized off‑peak. The goal of "intelligent computation" is to allocate these resources finely and individually to maximize revenue under overall compute constraints.

Overall Approach

The first phase used a linear‑programming solution (DCAF) to allocate elastic queues. This second phase introduces ES‑MACA (Evolutionary Strategies based Multi‑Action Computation Allocation), which jointly decides three actions—elastic channel, elastic queue length, and elastic model—across the full ad‑serving funnel.

Problem Formalization

Given M decision modules, the objective is to select a compute tier for each module so that total compute stays within a budget while maximizing overall traffic revenue. The formulation is expressed as a constrained optimization problem (see the diagram in the source) and modeled as a Markov Decision Process (MDP) or Partially Observable MDP.

Challenges

Generality: Existing linear‑programming methods are tightly coupled to specific business constraints; any change requires costly re‑modeling and strong data assumptions.

Sequential Decision: Actions at earlier stages affect the state for later stages, making it impossible to evaluate a single module in isolation.

Evolutionary‑Strategy Solution

Evolutionary algorithms (EA) avoid local optima, parallelize naturally, handle non‑convex, discontinuous problems, and require little prior knowledge. ES‑MACA uses the Cross‑Entropy Method (CEM) to evolve agent parameters that maximize a reward composed of business revenue plus a penalty for exceeding compute limits.

Offline Training

Randomly initialize a population of agent parameters (mean and variance).

For each sample, run an offline simulator that replays historical traffic, letting the agent interact with the system to produce a sequence of actions.

Estimate revenue using a multi‑task DNN that predicts exposure, click, and order values.

Rank samples by expected revenue, keep the top‑K, update the mean/variance, and repeat until convergence.

The simulator reproduces the online interaction logic (recall → queue truncation → model selection) and provides a reward for each parameter set.

Reward Evaluation

The reward combines the estimated business revenue with a compute‑budget penalty; stricter constraints increase the penalty coefficient.

Online Decision

At runtime, the offline‑optimized agent receives the current traffic state, decides the elastic channel, then the queue length, and finally the model index, each step feeding back the updated state to the next decision.

System Construction

The platform adds lightweight multi‑action decision capability, fine‑grained PID feedback control for compute tiers, and standardizes actions, data, and workflow. Actions such as "experiment", "feature fetch", "parameter handling", and "ES‑MACA decision" are abstracted as reusable components, enabling rapid integration across dozens of business lines.

Experiments

Offline Experiments

Baselines (fixed decisions) were compared against single‑action elastic channel, queue, model, modular optimal learning, and the full ES‑MACA joint optimization. Results showed that joint optimization outperforms modular learning by 0.53% in revenue, confirming interaction effects among actions.

Online A/B Test

A week‑long online test compared three groups: baseline (no intelligent allocation), elastic‑queue only, and ES‑MACA (full joint allocation). ES‑MACA achieved higher CPM and revenue (+~1.x%) while meeting latency targets (TP99 + 1.8 ms, TP999 + 2.6 ms).

Conclusion and Outlook

The paper documents the evolution from linear programming to an EA‑based multi‑action allocation, demonstrating measurable gains in Meituan's delivery ad system. Future work includes exploring reinforcement‑learning agents for finer‑grained joint optimization and extending the framework to online, near‑line, and offline layers.

References

Jiang et al., "DCAF: A Dynamic Computation Allocation Framework for Online Serving System" (2020).

Yang et al., "Computation Resource Allocation Solution in Recommender Systems" (2021).

Meituan advertising series (various internal reports).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Advertisingreinforcement learningresource allocationonline advertisingMeituanevolutionary strategiesintelligent computation
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.