MaRCA: Multi‑Agent Reinforcement Learning Computation Allocation for Full‑Chain Advertising Systems

The article presents MaRCA, a multi‑agent reinforcement learning framework that models user value, compute consumption, and action reward to allocate limited computation resources across the entire advertising recommendation pipeline, achieving higher ad revenue while keeping system load stable under fluctuating traffic and diverse request values.

JD Tech

Apr 8, 2025

MaRCA: Multi‑Agent Reinforcement Learning Computation Allocation for Full‑Chain Advertising Systems

With the rapid growth of JD’s external advertising business, billions of user requests must be processed within sub‑second latency, creating a severe computation‑resource challenge when traffic fluctuates and request values differ widely.

To address this, the authors formulate the full‑chain compute‑allocation problem as a multi‑agent reinforcement learning (MARL) task: given the system state st, choose an action combination at that maximizes the reward R(st,at)=Q(st,at)−λC(st,at) while respecting per‑module load constraints Cm. The state space includes user features, traffic characteristics, and IDC information; actions are categorized as link‑selection, switch, and queue decisions.

MaRCA consists of four tightly coupled modules: (1) a user‑value estimator that predicts ad‑revenue potential per request, (2) a compute‑estimator that predicts CPU consumption for each action, (3) an action‑value estimator that predicts advertising consumption, and (4) a load‑aware decision module that dynamically adjusts the trade‑off factor λ based on real‑time CPU load and elastic‑degradation signals.

The optimization is expressed as a constrained linear program and solved via Lagrangian duality, yielding closed‑form expressions for the optimal action values and the optimal λ‑adjustment policy.

Extensive offline and online experiments, including the 2024 618 and 11.11 shopping festivals, demonstrate that MaRCA improves ad consumption by 14.93% without increasing compute resources, while significantly reducing system risk and improving reliability.

Future work plans to incorporate model‑predictive control for proactive λ prediction, expand the action space with model‑selection and filtering decisions, and generalize the approach to other recommendation pipelines facing tight compute budgets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning resource optimization Multi-Agent Reinforcement Learning advertising systems computation allocation Load-Aware Scheduling

Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.