MaRCA: Multi‑Agent Reinforcement Learning Computation Allocation for Full‑Chain Advertising Systems
The article presents MaRCA, a multi‑agent reinforcement learning framework that models user value, compute consumption, and action reward to allocate limited computation resources across the entire advertising recommendation pipeline, achieving higher ad revenue while keeping system load stable under fluctuating traffic and diverse request values.
With the rapid growth of JD’s external advertising business, billions of user requests must be processed within sub‑second latency, creating a severe computation‑resource challenge when traffic fluctuates and request values differ widely.
To address this, the authors formulate the full‑chain compute‑allocation problem as a multi‑agent reinforcement learning (MARL) task: given the system state st , choose an action combination at that maximizes the reward R(st,at)=Q(st,at)−λC(st,at) while respecting per‑module load constraints Cm . The state space includes user features, traffic characteristics, and IDC information; actions are categorized as link‑selection, switch, and queue decisions.
MaRCA consists of four tightly coupled modules: (1) a user‑value estimator that predicts ad‑revenue potential per request, (2) a compute‑estimator that predicts CPU consumption for each action, (3) an action‑value estimator that predicts advertising consumption, and (4) a load‑aware decision module that dynamically adjusts the trade‑off factor λ based on real‑time CPU load and elastic‑degradation signals.
The optimization is expressed as a constrained linear program and solved via Lagrangian duality, yielding closed‑form expressions for the optimal action values and the optimal λ‑adjustment policy.
Extensive offline and online experiments, including the 2024 618 and 11.11 shopping festivals, demonstrate that MaRCA improves ad consumption by 14.93% without increasing compute resources, while significantly reducing system risk and improving reliability.
Future work plans to incorporate model‑predictive control for proactive λ prediction, expand the action space with model‑selection and filtering decisions, and generalize the approach to other recommendation pipelines facing tight compute budgets.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.